You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/02 08:03:51 UTC

[GitHub] [spark] sandeep-katta opened a new pull request, #38874: [SPARK-41235][SQL][PYTHON]: High-order function: array_compact implementation

sandeep-katta opened a new pull request, #38874:
URL: https://github.com/apache/spark/pull/38874

   
   
   ### What changes were proposed in this pull request?
   [[SPARK-41235](https://issues.apache.org/jira/browse/SPARK-41235)]Adding support for array_compact API implementation. 
   
   ### Why are the changes needed?
   New API added to remove null elements from the given Array
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes, new API array_compact is added which can be used in SQL
   
   
   ### How was this patch tested?
   Added unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038723116


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   I think we can leave the current behaviour as we have `transform` API which can be used in such cases 
   for e.g
   ```
   scala> val df_test = Seq(Seq(Array[Integer](1, null, 3), null, Array[Integer](null, 2, 3))).toDF("a")
   df_test: org.apache.spark.sql.DataFrame = [a: array<array<int>>]
   
   scala> df_test.select(array_compact(transform($"a", x => array_compact(x))).as("t")).show(false)
   +----------------+
   |t               |
   +----------------+
   |[[1, 3], [2, 3]]|
   +----------------+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039205150


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("Array Compact") {
+    val a0 = Literal.create(Seq(1, null, 3, null, 2, 5), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", null, "c", "b"), ArrayType(StringType))
+    val a2 = Literal.create(Seq[String](null, "", null, ""), ArrayType(StringType))
+    val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a4 = Literal.create(null, ArrayType(StringType))
+    val a5 = Literal.create(Seq(1, null, 8, 9, null), ArrayType(IntegerType))
+    val a6 = Literal.create(Seq(true, false, null, true), ArrayType(BooleanType))
+    val a7 = Literal.create(Seq(null, null), ArrayType(IntegerType))
+
+    checkEvaluation(ArrayCompact(a0), Seq(1, 3, 2, 5))
+    checkEvaluation(ArrayCompact(a1), Seq( "a", "b", "c", "b"))
+    checkEvaluation(ArrayCompact(a2), Seq( "", ""))
+    checkEvaluation(ArrayCompact(a3), Seq.empty[Integer])
+    checkEvaluation(ArrayCompact(a4), null)
+    checkEvaluation(ArrayCompact(a5), Seq(1, 8, 9))
+    checkEvaluation(ArrayCompact(a6), Seq(true, false, true))
+    checkEvaluation(ArrayCompact(a7), Seq.empty[Integer])
+
+    // complex data types
+    val binary_arr = Literal.create(Seq[Array[Byte]](null, Array[Byte](1, 2)),

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1343734796

   > @beliefer would you mind also help reviewing? Thanks
   
   Thank you for you ping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047051620


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression): ArrayCompact = {

Review Comment:
   Do we really need `apply` ?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty

Review Comment:
   Then the `makeSQLString` of `InheritAnalysisRules` will be correct.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty

Review Comment:
   ```suggestion
     override def parameters: Seq[Expression] = Seq(expr)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047086983


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty

Review Comment:
   FYI, even with this suggested change we still get `Cannot resolve "filter(a, lambdafunction((x_0 IS NOT NULL), x_0))"` error.  The only difference I am seeing is in `Project` node
   
   Seq.empty
   
   ```
   [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "filter(a, lambdafunction((x_0 IS NOT NULL), x_0))" due to data type mismatch: Parameter 1 requires the "ARRAY" type, however "a" has the type "INT".;
   'Project [unresolvedalias(array_compact(), Some(org.apache.spark.sql.Column$$Lambda$1642/1906635841@124eb83d))]
   +- Project [value#1 AS a#4]
      +- LocalRelation [value#1]
   ```
   Seq(expr)
   ```
   [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "filter(a, lambdafunction((x_0 IS NOT NULL), x_0))" due to data type mismatch: Parameter 1 requires the "ARRAY" type, however "a" has the type "INT".;
   'Project [unresolvedalias(array_compact('a), Some(org.apache.spark.sql.Column$$Lambda$1642/48143526@7741ae1b))]
   +- Project [value#1 AS a#4]
      +- LocalRelation [value#1]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1343778634

   > Basically, the implementation looks good. But we can use `RuntimeReplaceable` to simplify this PR.
   > 
   > `ArrayCompact` can reuse `ArrayRemove` and implement `RuntimeReplaceable`.
   > 
   > ```
   >       > SELECT array_compact(array(1, 2, 3, null));
   >         [1,2,3]
   > ```
   > 
   > equals to
   > 
   > ```
   >       > SELECT array_remove(array(1, 2, 3, null), null);
   >         [1,2,3]
   > ```
   
   A per current behaviour `array_remove` is returning `null` if one of the argument is `nulll`
   ```
   spark-sql> SELECT array_remove(array(1, 2, 3, null), null);
   NULL
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1340524315

   > Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go.
   
   @HyukjinKwon @zhengruifeng The Scala part is good to me, please further review, thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1336792961

   > > Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql`
   > 
   > Thanks @LuciferYang for the review, I will add sql tests and in `DataFrameFunctionsSuite` there is an input to test the invalid data type. So could you please let me know what are other invalid inputs you want me to cover ? `val invalid_datatype_df = Seq(1, 2, 3).toDF("a")`
   
   Sorry, I missing this one 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1043982018


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,57 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())
+    var pos = 0
+    var hasNull = false
+    array.asInstanceOf[ArrayData].foreach(elementType, (index, v) =>
+      // add elements only if the source has null
+      if (v != null && hasNull) {
+        newArray(pos) = v
+        pos += 1
+      } else if (v == null && !hasNull) {
+        hasNull = true
+        // source has null elements, so copy the elements to newArray
+        for(i <- 0 until index) {
+          newArray(pos) = array.asInstanceOf[ArrayData].get(i, elementType)
+          pos += 1
+        }
+      }
+    )
+    if (hasNull) {
+      new GenericArrayData(newArray.slice(0, pos))
+    } else {
+      array
+    }
+  }
+  override def prettyName: String = "array_compact"
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+    nullSafeCodeGen(ctx, ev, array => {
+      val expr = ctx.addReferenceObj("arrayCompactExpr", this)
+      s"${ev.value} = (ArrayData)$expr.nullSafeEval($array);"

Review Comment:
   This implementation has some hackers. Please see the discussion https://github.com/apache/spark/pull/38865#discussion_r1043977011



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1343723426

   @sandeep-katta Could you update the PR description and add the info contains syntax, arguments, examples and the mainstream database supports array_append ? Please refer https://github.com/apache/spark/pull/38865


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046710242


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("Array Compact") {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1357095310

   friendly ping @cloud-fan @gengliangwang @HyukjinKwon  @zhengruifeng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1048131718


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,15 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+create temporary view invalid_datatype as select * from values
+(1), (2), (3)
+as invalid_datatype(id);
+
+select array_compact(id) from invalid_datatype;
+select array_compact(array("1", null, "2", null));
+select array_compact(array("1", null, "2", null));
+select array_compact(array(1D, null, 2D, null));
+select array_compact(array(array(1, 2, 3, null), null, array(4, null, 6)));
+select array_compact(array(null));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046748387


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLamdbaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression) = {
+    new ArrayCompact(expr, ArrayFilter(expr, createLamdbaExpr()))
+  }
+
+  def createLamdbaExpr() = {

Review Comment:
   should be private?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1351958535

   @HyukjinKwon pipeline is failing with the below error which has no relation with the PR and also I rebased with the latest master but still it is failing. How do I fix it ?
   ```
   Error:  The build could not read 25 projects -> [Help 1]
   [85](https://github.com/sandeep-katta/spark/actions/runs/3696575472/jobs/6260493319#step:22:86)
   Error:    
   [86](https://github.com/sandeep-katta/spark/actions/runs/3696575472/jobs/6260493319#step:22:87)
   Error:    The project org.apache.spark:spark-sketch_2.12:3.4.0-SNAPSHOT (/__w/spark/spark/common/sketch/pom.xml) has 1 error
   [87](https://github.com/sandeep-katta/spark/actions/runs/3696575472/jobs/6260493319#step:22:88)
   Error:      Non-resolvable parent POM for org.apache.spark:spark-sketch_2.12:3.4.0-SNAPSHOT: Could not find artifact org.apache.spark:spark-parent_2.12:pom:3.4.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 22, column 11 -> [Help 2]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038154998


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("SPARK-41235: Array Compact") {
+    val a0 = Literal.create(Seq(1, null, 3, null, 2, 5), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", null, "c", "b"), ArrayType(StringType))
+    val a2 = Literal.create(Seq[String](null, "", null, ""), ArrayType(StringType))
+    val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a4 = Literal.create(null, ArrayType(StringType))
+    val a5 = Literal.create(Seq(1, null, 8, 9, null), ArrayType(IntegerType))
+    val a6 = Literal.create(Seq(true, false, null, true), ArrayType(BooleanType))
+    val a7 = Literal.create(Seq(null, null), ArrayType(IntegerType))
+
+    checkEvaluation(ArrayCompact(a0), Seq(1, 3, 2, 5))
+    checkEvaluation(ArrayCompact(a1), Seq( "a", "b", "c", "b"))
+    checkEvaluation(ArrayCompact(a2), Seq( "", ""))
+    checkEvaluation(ArrayCompact(a3), Seq.empty[Integer])
+    checkEvaluation(ArrayCompact(a4), null)
+    checkEvaluation(ArrayCompact(a5), Seq(1, 8, 9))
+    checkEvaluation(ArrayCompact(a6), Seq(true, false, true))
+    checkEvaluation(ArrayCompact(a7), Seq.empty[Integer])

Review Comment:
   Should add a test: `input is null and result is null`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1045736066


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   I tried this this will give `StackOverflowError` exception.
   
   Reason: 
   `child` is overridden method which in turn calls `replacement` 
   ```
   Code: 
   trait InheritAnalysisRules extends UnaryLike[Expression] { self: RuntimeReplaceable =>
     override def child: Expression = replacement
   ```
   
   If I am trying to use `expr.dataType` this gives `UnresolvedException` which is correct cz `replacement` is not yet resolved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038114068


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   I personally think the output `[[1, null, 3], [null, 2, 3]]` is expected, let me confirm it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038150138


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   I have updated the Unit Tests to reflect this behaviour. If we required to change then I will update the code and UT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038199014


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("SPARK-41235: Array Compact") {
+    val a0 = Literal.create(Seq(1, null, 3, null, 2, 5), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", null, "c", "b"), ArrayType(StringType))
+    val a2 = Literal.create(Seq[String](null, "", null, ""), ArrayType(StringType))
+    val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a4 = Literal.create(null, ArrayType(StringType))
+    val a5 = Literal.create(Seq(1, null, 8, 9, null), ArrayType(IntegerType))
+    val a6 = Literal.create(Seq(true, false, null, true), ArrayType(BooleanType))
+    val a7 = Literal.create(Seq(null, null), ArrayType(IntegerType))
+
+    checkEvaluation(ArrayCompact(a0), Seq(1, 3, 2, 5))
+    checkEvaluation(ArrayCompact(a1), Seq( "a", "b", "c", "b"))
+    checkEvaluation(ArrayCompact(a2), Seq( "", ""))
+    checkEvaluation(ArrayCompact(a3), Seq.empty[Integer])
+    checkEvaluation(ArrayCompact(a4), null)
+    checkEvaluation(ArrayCompact(a5), Seq(1, 8, 9))
+    checkEvaluation(ArrayCompact(a6), Seq(true, false, true))
+    checkEvaluation(ArrayCompact(a7), Seq.empty[Integer])

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1336598962

   Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1336789676

   > Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql`
   
   Thanks @LuciferYang for the review, I will add sql tests and  in `DataFrameFunctionsSuite` there is an input to test the invalid data type.  So could you please let me know what are other invalid inputs you want me to cover ?
   `val invalid_datatype_df = Seq(1, 2, 3).toDF("a")`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039470282


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   In this case, I think we do not need to exactly match Snowflake.
   
   I think we should refer to `ARRAY_DISTINCT` which will not deduplicate its elements when its elements are also arrays:
   
   ```
   scala> spark.sql(""" SELECT array_distinct(array(a,b,c)) FROM VALUES (ARRAY(1,1,2), ARRAY(8,8), ARRAY(1,1,2)) AS tab(a,b,c) """).head
   res19: org.apache.spark.sql.Row = [WrappedArray(WrappedArray(1, 1, 2), WrappedArray(8, 8))]
   ```
   
   also cc @HyukjinKwon @cloud-fan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038030055


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   if the input is `Array(Array(1,null,3), null, Array(null, 2, 3))`, what is the result?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1048130574


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression): ArrayCompact = {

Review Comment:
   Updated, Thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047984704


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##########
@@ -704,6 +704,7 @@ object FunctionRegistry {
     expression[MapZipWith]("map_zip_with"),
     expression[ZipWith]("zip_with"),
     expression[Get]("get"),
+    expression[ArrayCompact]("array_compact"),

Review Comment:
   Please put the `ArrayCompact` and other `Array*` functions together.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1353005772

   > @HyukjinKwon pipeline is failing with the below error which has no relation with the PR and also I rebased with the latest master but still it is failing. How do I fix it ?
   > 
   > ```
   > Error:  The build could not read 25 projects -> [Help 1]
   > [85](https://github.com/sandeep-katta/spark/actions/runs/3696575472/jobs/6260493319#step:22:86)
   > Error:    
   > [86](https://github.com/sandeep-katta/spark/actions/runs/3696575472/jobs/6260493319#step:22:87)
   > Error:    The project org.apache.spark:spark-sketch_2.12:3.4.0-SNAPSHOT (/__w/spark/spark/common/sketch/pom.xml) has 1 error
   > [87](https://github.com/sandeep-katta/spark/actions/runs/3696575472/jobs/6260493319#step:22:88)
   > Error:      Non-resolvable parent POM for org.apache.spark:spark-sketch_2.12:3.4.0-SNAPSHOT: Could not find artifact org.apache.spark:spark-parent_2.12:pom:3.4.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 22, column 11 -> [Help 2]
   > ```
   
   @sandeep-katta please rebase the code, already fix this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1057161110


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   ```
   select array_compact(array_construct(array_construct(1, null), null));
   +----------------------------------------------------------------+
   | ARRAY_COMPACT(ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(1, NULL), NULL)) |
   |----------------------------------------------------------------|
   | [                                                              |
   |   [                                                            |
   |     1,                                                         |
   |     undefined                                                  |
   |   ]                                                            |
   | ]                                                              |
   +----------------------------------------------------------------+
   1 Row(s) produced. Time Elapsed: 0.112s
   ```
   
   FYI, we confirmed that snowflake's `array_compact` will not remove its element arrays' NULL values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1336834908

   @LuciferYang I added SQL tests you could you please review again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1041775256


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   +1 for @zhengruifeng for now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1042177641


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,44 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())
+    var pos = 0
+    array.asInstanceOf[ArrayData].foreach (elementType, (i, v) =>
+      if (v != null ) {
+        newArray(pos) = v
+        pos += 1
+      }
+    )
+    new GenericArrayData(newArray.slice(0, pos))

Review Comment:
   nit: I think we can add a new variable: `hasNull`
   
   we only need assignment `newArray(pos) = v` and slice `newArray.slice(0, pos)` when `hasNull` is True



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038197724


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   Got it, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046747994


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLamdbaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression) = {
+    new ArrayCompact(expr, ArrayFilter(expr, createLamdbaExpr()))
+  }
+
+  def createLamdbaExpr() = {

Review Comment:
   Typo: Lambda



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1045464676


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   I tried to extend `InheritAnalysisRules`  but my tests are failing with the below exception, not sure what is the mistake I am doing here. Could you guide me here please ?
   
   ```
   Max iterations (100) reached for batch Resolution, please set 'spark.sql.analyzer.maxIterations' to a larger value.
   java.lang.RuntimeException: Max iterations (100) reached for batch Resolution, please set 'spark.sql.analyzer.maxIterations' to a larger value.
   ```
   
   Updated Code:
   ```
   case class ArrayCompact(expr: Expression)
     extends RuntimeReplaceable with InheritAnalysisRules {
   
     lazy val isNotNull: Expression => Expression = x => IsNotNull(x)
     lazy val lv = UnresolvedNamedLambdaVariable(Seq(UnresolvedNamedLambdaVariable.freshVarName("x")))
     lazy val lambda = LambdaFunction(isNotNull(lv), Seq(lv))
   
     lazy override val replacement: Expression = ArrayFilter(expr, lambda)
   
     override def parameters: Seq[Expression] = Seq(expr)
   
     override def prettyName: String = "array_compact"
   
     override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
       copy(expr = newChild)
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046697980


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   @beliefer for `DataType` Mismatch error it throws exception as `filter` instead of `array_compact`. So let me know if this is okay or not ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1365589473

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1365587749

   Should we merge this one first? The other 3 new functions may need to resolve conflicts after this merge.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038153408


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("SPARK-41235: Array Compact") {

Review Comment:
   This is not a bug fix, so it is not necessary to add `SPARK-41235` to the test case name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039256946


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+select
+  array_compact(boolean_array),
+  array_compact(tinyint_array),
+  array_compact(smallint_array),
+  array_compact(int_array),
+  array_compact(bigint_array),
+  array_compact(decimal_array),
+  array_compact(double_array),
+  array_compact(float_array),
+  array_compact(date_array),
+  array_compact(timestamp_array)
+from primitive_arrays;
+select array_compact(array("1", null, "2", null));
+select array_compact(array(1D, null, 2D, null));
+select array_compact(array(array(1, 2, 3, null), null, array(4, null, 6)));
+select array_compact(array(null));

Review Comment:
   remind this @sandeep-katta 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038027808


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)

Review Comment:
   hmm... why `elementType` need `orderable`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038192592


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)

Review Comment:
   `ArrayCompact` is extending `ExpectsInputTypes` so I believe this data type check will be taken care by this super class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046572899


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   You can refer https://github.com/apache/spark/blob/d2212e352043063b14cac8ec53fe65731ba593f5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L2500



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047198481


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression): ArrayCompact = {

Review Comment:
   +1, Agree with @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1048130169


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   What I understand is if any class extends `InheritAnalysisRules` then `replacement` object is used for all the analysis rules, so even if I implement `def checkInputDataTypes(): TypeCheckResult` it will be a dead code.
   
   And also as per the guideline [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L325-L355)
   
   ```
   // It's much simpler if the SQL function can be implemented with existing expression(s). There are
   // a few cases:
   
   //   - The function can be implemented by combining some existing expressions. We can use
   //     `RuntimeReplaceable` to define the combination. See `ParseToDate` as an example.
   //     To inherit the analysis behavior from the replacement expression
   //     mix-in `InheritAnalysisRules` with `RuntimeReplaceable`. See `TryAdd` as an example.
   
   ```
   
   So if we want to have analysis rules to applied for `ArrayCompact` then it should `extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1045464676


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   I tried to extend `InheritAnalysisRules`  but my tests are failing with the below exception, not sure what is the mistake I am doing here. Could you guide me hear please ?
   
   ```
   Max iterations (100) reached for batch Resolution, please set 'spark.sql.analyzer.maxIterations' to a larger value.
   java.lang.RuntimeException: Max iterations (100) reached for batch Resolution, please set 'spark.sql.analyzer.maxIterations' to a larger value.
   ```
   
   Updated Code:
   ```
   case class ArrayCompact(expr: Expression)
     extends RuntimeReplaceable with InheritAnalysisRules {
   
     lazy val isNotNull: Expression => Expression = x => IsNotNull(x)
     lazy val lv = UnresolvedNamedLambdaVariable(Seq(UnresolvedNamedLambdaVariable.freshVarName("x")))
     lazy val lambda = LambdaFunction(isNotNull(lv), Seq(lv))
   
     lazy override val replacement: Expression = ArrayFilter(expr, lambda)
   
     override def parameters: Seq[Expression] = Seq(expr)
   
     override def prettyName: String = "array_compact"
   
     override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
       copy(expr = newChild)
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation
URL: https://github.com/apache/spark/pull/38874


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038123911


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   @zhengruifeng I think it is better to give some typical cases in jira. Personally, the document description of snowflake is not very clear :(
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038159883


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)

Review Comment:
   I think we need override `checkInputDataTypes` function, as the document description, `the input should be either an ARRAY data type or a VARIANT data type containing an array value.` or `null`, other types should return the corresponding `DataTypeMismatch`.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1336772413

   > I personally think the output `[[1, null, 3], [null, 2, 3]]` is expected, let me confirm it.
   
   If this case is correct, I think Scala part is OK except fo  lack of `invalid input` test @HyukjinKwon 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038074805


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)

Review Comment:
   Not required, will remove this. Thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1045729620


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   Please try to use this method.
   ```
       lazy val ArrayType(elementType, nullable) = child.dataType
       lazy val lv = NamedLambdaVariable("arg", elementType, nullable)
       lazy val function = f(lv)
       lazy val lambda = LambdaFunction(function, Seq(lv))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047987567


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,15 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+create temporary view invalid_datatype as select * from values

Review Comment:
   The temp view just used once. We can remove it.



##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,15 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+create temporary view invalid_datatype as select * from values
+(1), (2), (3)
+as invalid_datatype(id);
+
+select array_compact(id) from invalid_datatype;

Review Comment:
   So, just pass one int value here.



##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,15 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+create temporary view invalid_datatype as select * from values
+(1), (2), (3)
+as invalid_datatype(id);
+
+select array_compact(id) from invalid_datatype;
+select array_compact(array("1", null, "2", null));
+select array_compact(array("1", null, "2", null));

Review Comment:
   The case is duplicated.



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))

Review Comment:
   ```suggestion
         Seq(Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))

Review Comment:
   ditto.



##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,15 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+create temporary view invalid_datatype as select * from values
+(1), (2), (3)
+as invalid_datatype(id);
+
+select array_compact(id) from invalid_datatype;
+select array_compact(array("1", null, "2", null));
+select array_compact(array("1", null, "2", null));
+select array_compact(array(1D, null, 2D, null));
+select array_compact(array(array(1, 2, 3, null), null, array(4, null, 6)));
+select array_compact(array(null));

Review Comment:
   Could you add a case that the input array without null ?



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )

Review Comment:
   ```suggestion
         Seq(Row(Seq(1.0, 2.0)))
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")

Review Comment:
   ```suggestion
       val invalidDatatypeDF = Seq(1, 2, 3).toDF("a")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1343731612

   Basically, the implementation looks good. But we can use `RuntimeReplaceable` to simplify this PR.
   
   `ArrayCompact` can reuse `ArrayRemove` and implement `RuntimeReplaceable`.
   ```
         > SELECT array_compact(array(1, 2, 3, null));
           [1,2,3]
   ```
   equals to
   ```
         > SELECT array_remove(array(1, 2, 3, null), null);
           [1,2,3]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1343813646

   OK. Another way. `ArrayCompact` can reuse `ArrayFilter` and implement `RuntimeReplaceable`.
   ```
         > SELECT filter(array(1, 2, 3, null), x -> x IS NOT NULL);
           [1,2,3]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1340874554

   @sandeep-katta seems need add the new function name to `functions.rst`? I'm not familiar with python part
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1340941717

   
   @beliefer would you mind also help reviewing? Thanks
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta0102 commented on pull request #38874: [SPARK-41235][SQL][PYTHON]: High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta0102 commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1334886794

   cc @HyukjinKwon @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039276727


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+select
+  array_compact(boolean_array),
+  array_compact(tinyint_array),
+  array_compact(smallint_array),
+  array_compact(int_array),
+  array_compact(bigint_array),
+  array_compact(decimal_array),
+  array_compact(double_array),
+  array_compact(float_array),
+  array_compact(date_array),
+  array_compact(timestamp_array)
+from primitive_arrays;
+select array_compact(array("1", null, "2", null));
+select array_compact(array(1D, null, 2D, null));
+select array_compact(array(array(1, 2, 3, null), null, array(4, null, 6)));
+select array_compact(array(null));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1336766817

   Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039216402


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+select
+  array_compact(boolean_array),
+  array_compact(tinyint_array),
+  array_compact(smallint_array),
+  array_compact(int_array),
+  array_compact(bigint_array),
+  array_compact(decimal_array),
+  array_compact(double_array),
+  array_compact(float_array),
+  array_compact(date_array),
+  array_compact(timestamp_array)
+from primitive_arrays;
+select array_compact(array("1", null, "2", null));
+select array_compact(array(1D, null, 2D, null));
+select array_compact(array(array(1, 2, 3, null), null, array(4, null, 6)));
+select array_compact(array(null));

Review Comment:
   A blank line is required at the end of the file
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039215759


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+select

Review Comment:
   I think we should change this case to one corresponding to
   
   ```
   val invalid_datatype_df = Seq(1, 2, 3).toDF("a")
   ```
   
   others is ok to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046749466


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLamdbaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression) = {

Review Comment:
   public member need type annotation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1048151279


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   OK. If so, please remove `InheritAnalysisRules` and select `UnaryLike[Expression]`, `ImplicitCastInputTypes`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047986108


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,36 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));

Review Comment:
   Could you add a case that the input array without null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1046773157


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   `"sqlExpr" -> """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)""""` looks strange,  replacement is an internal behavior, this will make the end user confused when it appears in the error message
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047047837


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   IIUC it is because `ArrayCompact` is extending `InheritAnalysisRules` and as per documentation "It makes `replacement` the child of the expression, to inherit the analysis rules for it such as type coercion" . So all the data type check is done for the `child` expression



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047086983


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty

Review Comment:
   FYI, even with this suggested change we still get `Cannot resolve "filter(a, lambdafunction((x_0 IS NOT NULL), x_0))"` error.  The only difference I am seeing is in `Project` node
   
   Seq.empty
   
   ```
   [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "filter(a, lambdafunction((x_0 IS NOT NULL), x_0))" due to data type mismatch: Parameter 1 requires the "ARRAY" type, however "a" has the type "INT".;
   'Project [unresolvedalias(array_compact(), Some(org.apache.spark.sql.Column$$Lambda$1642/1906635841@124eb83d))]
   +- Project [value#1 AS a#4]
      +- LocalRelation [value#1]
   ```
   Seq(expr)
   ```
   [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "filter(a, lambdafunction((x_0 IS NOT NULL), x_0))" due to data type mismatch: Parameter 1 requires the "ARRAY" type, however "a" has the type "INT".;
   'Project [unresolvedalias(array_compact('a), Some(org.apache.spark.sql.Column$$Lambda$1642/48143526@7741ae1b))]
   +- Project [value#1 AS a#4]
      +- LocalRelation [value#1]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1045194337


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   We should remove `UnaryLike`, `ExpectsInputTypes` and `NullIntolerant` here and implements `InheritAnalysisRules`.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {
+
+  val isNotNull: Expression => Expression = x => IsNotNull(x)

Review Comment:
   ```suggestion
     lazy val isNotNull: Expression => Expression = x => IsNotNull(x)
   ```



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("Array Compact") {

Review Comment:
   We can remove these test now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047078733


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression): ArrayCompact = {

Review Comment:
   It seems that the `apply` is not necessary. cc @LuciferYang @cloud-fan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047981989


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   You can still implement `def checkInputDataTypes(): TypeCheckResult`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047984704


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##########
@@ -704,6 +704,7 @@ object FunctionRegistry {
     expression[MapZipWith]("map_zip_with"),
     expression[ZipWith]("zip_with"),
     expression[Get]("get"),
+    expression[ArrayCompact]("array_compact"),

Review Comment:
   Please put the `ArrayCompact` and other Array* functions together.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047047837


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
       )
     )
   }
+
+  test("test array_compact") {
+    val df = Seq(
+      (Array[Integer](null, 1, 2, null, 3, 4),
+        Array("a", null, "b", null, "c", "d"), Array("", "")),
+      (Array.empty[Integer], Array("1.0", "2.2", "3.0"), Array.empty[String]),
+      (Array[Integer](null, null, null), null, null)
+    ).toDF("a", "b", "c")
+
+    checkAnswer(
+      df.select(array_compact($"a"),
+        array_compact($"b"), array_compact($"c")),
+      Seq(Row(Seq(1, 2, 3, 4), Seq("a", "b", "c", "d"), Seq("", "")),
+        Row(Seq.empty[Integer], Seq("1.0", "2.2", "3.0"), Seq.empty[String]),
+        Row(Seq.empty[Integer], null, null))
+    )
+
+    checkAnswer(
+      OneRowRelation().selectExpr("array_compact(array(1.0D, 2.0D, null))"),
+      Seq(
+        Row(Seq(1.0, 2.0))
+      )
+    )
+
+    // complex data type
+    checkAnswer(
+      OneRowRelation().
+        selectExpr("array_compact(array(array(1, null,3), null, array(null, 2, 3)))"),
+      Seq(
+        Row(Seq(Seq(1, null, 3), Seq(null, 2, 3))))
+    )
+
+    // unsupported data type
+    val invalid_Datatype_df = Seq(1, 2, 3).toDF("a")
+    checkErrorMatchPVals(
+      exception = intercept[AnalysisException] {
+        invalid_Datatype_df.select(array_compact($"a"))
+      },
+      errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+      parameters = Map(
+        "sqlExpr" ->  """"filter\(a, lambdafunction\(\(x_\d+ IS NOT NULL\), x_\d+\)\)"""",

Review Comment:
   IIUC it is because `ArrayCompact` is extending `InheritAnalysisRules` and as per documentation "It makes `replacement` the child of the expression, to inherit the analysis rules for it" . So all the data type check is done for the `child` expression



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1045736066


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,30 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends RuntimeReplaceable with UnaryLike[Expression] with ExpectsInputTypes with NullIntolerant {

Review Comment:
   I tried this this will give `StackOverflowError` exception.
   
   Reason: 
   `child` is overridden method which in turn calls `replacement` 
   ```
   Code: 
   trait InheritAnalysisRules extends UnaryLike[Expression] { self: RuntimeReplaceable =>
     override def child: Expression = replacement
   ```
   
   If I try to use `expr.dataType` this gives `UnresolvedException` which is correct cz `replacement` is not yet resolved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

beliefer commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1361067943

   @sandeep-katta  Could you fix the conflicts ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1041772280


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)
+    }
+  }
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())

Review Comment:
   I am slicing the array at the end like `new GenericArrayData(newArray.slice(0, pos))`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1044002198


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,57 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())
+    var pos = 0
+    var hasNull = false
+    array.asInstanceOf[ArrayData].foreach(elementType, (index, v) =>
+      // add elements only if the source has null
+      if (v != null && hasNull) {
+        newArray(pos) = v
+        pos += 1
+      } else if (v == null && !hasNull) {
+        hasNull = true
+        // source has null elements, so copy the elements to newArray
+        for(i <- 0 until index) {
+          newArray(pos) = array.asInstanceOf[ArrayData].get(i, elementType)
+          pos += 1
+        }
+      }
+    )
+    if (hasNull) {
+      new GenericArrayData(newArray.slice(0, pos))
+    } else {
+      array
+    }
+  }
+  override def prettyName: String = "array_compact"
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+    nullSafeCodeGen(ctx, ev, array => {
+      val expr = ctx.addReferenceObj("arrayCompactExpr", this)
+      s"${ev.value} = (ArrayData)$expr.nullSafeEval($array);"

Review Comment:
   Sounds a good suggestion
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038686688


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   > I personally think the output `[[1, null, 3], [null, 2, 3]]` is expected, let me confirm it.
   
   Is there a clear conclusion?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1039150323


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("Array Compact") {
+    val a0 = Literal.create(Seq(1, null, 3, null, 2, 5), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", null, "c", "b"), ArrayType(StringType))
+    val a2 = Literal.create(Seq[String](null, "", null, ""), ArrayType(StringType))
+    val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a4 = Literal.create(null, ArrayType(StringType))
+    val a5 = Literal.create(Seq(1, null, 8, 9, null), ArrayType(IntegerType))
+    val a6 = Literal.create(Seq(true, false, null, true), ArrayType(BooleanType))
+    val a7 = Literal.create(Seq(null, null), ArrayType(IntegerType))
+
+    checkEvaluation(ArrayCompact(a0), Seq(1, 3, 2, 5))
+    checkEvaluation(ArrayCompact(a1), Seq( "a", "b", "c", "b"))
+    checkEvaluation(ArrayCompact(a2), Seq( "", ""))
+    checkEvaluation(ArrayCompact(a3), Seq.empty[Integer])
+    checkEvaluation(ArrayCompact(a4), null)
+    checkEvaluation(ArrayCompact(a5), Seq(1, 8, 9))
+    checkEvaluation(ArrayCompact(a6), Seq(true, false, true))
+    checkEvaluation(ArrayCompact(a7), Seq.empty[Integer])
+
+    // complex data types
+    val binary_arr = Literal.create(Seq[Array[Byte]](null, Array[Byte](1, 2)),

Review Comment:
   I prefer to use `Camel-Case`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1047073145


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,40 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+
+  def this(expr: Expression) = this(expr,
+    ArrayFilter(expr, ArrayCompact.createLambdaExpr()))
+
+  override def parameters: Seq[Expression] = Seq.empty
+
+  override def prettyName: String = "array_compact"
+
+  override protected def withNewChildInternal(newChild: Expression): ArrayCompact =
+    copy(replacement = newChild)
+}
+
+object ArrayCompact {
+
+  def apply(expr: Expression): ArrayCompact = {

Review Comment:
   Yes, because in `functions.scala`  `ArrayCompact` is created without new `operator`. If required I can add new `operator` and remove apply from here, WDYT ?
   
   ```
     def array_compact(column: Column): Column = withExpr {
       ArrayCompact(column.expr)
     }
   ```
   to 
   
   ```
     def array_compact(column: Column): Column = withExpr {
       new ArrayCompact(column.expr)
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1048131081


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,36 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1048131415


##########
sql/core/src/test/resources/sql-tests/inputs/array.sql:
##########
@@ -119,3 +119,15 @@ select get(array(1, 2, 3), 0);
 select get(array(1, 2, 3), 3);
 select get(array(1, 2, 3), null);
 select get(array(1, 2, 3), -1);
+
+-- function array_compact
+create temporary view invalid_datatype as select * from values
+(1), (2), (3)
+as invalid_datatype(id);
+
+select array_compact(id) from invalid_datatype;
+select array_compact(array("1", null, "2", null));
+select array_compact(array("1", null, "2", null));

Review Comment:
   Thank you for flagging it, I updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] infoankitp commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

infoankitp commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1037911293


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)
+    }
+  }
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())

Review Comment:
   @sandeep-katta I think the size of the new array can be smaller than numElements in the source array. Better to use an arraybuffer add non-null elements into it and pass them further



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1041845222


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)
+    }
+  }
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())

Review Comment:
   I think `Array` is ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1041852572


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,44 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+
+  override def nullSafeEval(array: Any): Any = {
+    val newArray = new Array[Any](array.asInstanceOf[ArrayData].numElements())
+    var pos = 0
+    array.asInstanceOf[ArrayData].foreach (elementType, (i, v) =>
+      if (v != null ) {
+        newArray(pos) = v
+        pos += 1
+      }
+    )
+    new GenericArrayData(newArray.slice(0, pos))
+  }
+  override def prettyName: String = "array_compact"
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+    nullSafeCodeGen(ctx, ev, (array) => {

Review Comment:
   nit: `(array)` -> `array`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta0102 commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta0102 commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038061244


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)

Review Comment:
   ```
   scala>     val df_test = Seq(Seq(Array[Integer](1, null, 3), null, Array[Integer](null, 2, 3))).toDF("a")
   df_test: org.apache.spark.sql.DataFrame = [a: array<array<int>>]
   
   scala>     df_test.select(array_compact($"a")).show(false)
   +----------------------------+
   |array_compact(a)            |
   +----------------------------+
   |[[1, null, 3], [null, 2, 3]]|
   +----------------------------+
   
   ```
   I guess it is wrong, I believe I also need to traverse the element if it is of type Array. Great find!!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038198325


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryL
   override protected def withNewChildrenInternal(
     newLeft: Expression, newRight: Expression): ArrayExcept = copy(left = newLeft, right = newRight)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes null values from the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3, null));
+       [1,2,3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayCompact(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes with NullIntolerant {
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+  override def dataType: DataType = child.dataType
+
+  @transient private lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case f if f.isFailure => f
+      case TypeCheckResult.TypeCheckSuccess =>
+        TypeUtils.checkForOrderingExpr(elementType, prettyName)

Review Comment:
   Got it, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on code in PR #38874:
URL: https://github.com/apache/spark/pull/38874#discussion_r1038196131


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -2596,4 +2596,33 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
           Date.valueOf("2017-02-12")))
     }
   }
+
+  test("SPARK-41235: Array Compact") {
+    val a0 = Literal.create(Seq(1, null, 3, null, 2, 5), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", null, "c", "b"), ArrayType(StringType))
+    val a2 = Literal.create(Seq[String](null, "", null, ""), ArrayType(StringType))
+    val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a4 = Literal.create(null, ArrayType(StringType))
+    val a5 = Literal.create(Seq(1, null, 8, 9, null), ArrayType(IntegerType))
+    val a6 = Literal.create(Seq(true, false, null, true), ArrayType(BooleanType))
+    val a7 = Literal.create(Seq(null, null), ArrayType(IntegerType))
+
+    checkEvaluation(ArrayCompact(a0), Seq(1, 3, 2, 5))
+    checkEvaluation(ArrayCompact(a1), Seq( "a", "b", "c", "b"))
+    checkEvaluation(ArrayCompact(a2), Seq( "", ""))
+    checkEvaluation(ArrayCompact(a3), Seq.empty[Integer])
+    checkEvaluation(ArrayCompact(a4), null)
+    checkEvaluation(ArrayCompact(a5), Seq(1, 8, 9))
+    checkEvaluation(ArrayCompact(a6), Seq(true, false, true))
+    checkEvaluation(ArrayCompact(a7), Seq.empty[Integer])

Review Comment:
   I believe the input with variable `a4` is covering this scenario



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on PR #38874:
URL: https://github.com/apache/spark/pull/38874#issuecomment-1343752740

   > Basically, the implementation looks good. But we can use `RuntimeReplaceable` to simplify this PR.
   > 
   > `ArrayCompact` can reuse `ArrayRemove` and implement `RuntimeReplaceable`.
   > 
   > ```
   >       > SELECT array_compact(array(1, 2, 3, null));
   >         [1,2,3]
   > ```
   > 
   > equals to
   > 
   > ```
   >       > SELECT array_remove(array(1, 2, 3, null), null);
   >         [1,2,3]
   > ```
   
   Sounds a good suggestion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org