You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/06 23:40:42 UTC

[GitHub] [spark] navinvishy opened a new pull request, #38947: SPARK-41231: Adds an array_prepend function to catalyst

navinvishy opened a new pull request, #38947:
URL: https://github.com/apache/spark/pull/38947

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   Adds a new array function array_prepend to catalyst.
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   This adds a function that exists in many SQL implementations, specifically Snowflake: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   Added unit tests.
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042621443


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr

Review Comment:
   I'm using `ArrayData.foreach` right? Maybe you meant somewhere else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1107724565


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))
+      s"""
+         |int $pos = 0;
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |$pos = $pos + 1;
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |  $pos = $pos + 1;
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""
+            ${leftGen.code}
+            ${rightGen.code}
+            ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+            $resultCode""", isNull = FalseLiteral)
+    }
+  }
+
+  override def prettyName: String = "array_prepend"
+
+  override protected def withNewChildrenInternal(
+      newLeft: Expression, newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+
+  override def dataType: DataType = left.dataType

Review Comment:
   I guess there is a test case missing to capture this nullability case? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471792565

   we need add a new rule to `CheckConnectJvmClientCompatibility`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042026795


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) | (NullType, _) =>
+        DataTypeMismatch(
+          errorSubClass = "NULL_TYPE",
+          messageParameters = Map("functionName" -> toSQLId(prettyName)))
+      case (l, _) if !ArrayType.acceptsType(l) =>
+        DataTypeMismatch(
+          errorSubClass = "UNEXPECTED_INPUT_TYPE",
+          messageParameters = Map(
+            "paramIndex" -> "1",
+            "requiredType" -> toSQLType(ArrayType),
+            "inputSql" -> toSQLExpr(left),
+            "inputType" -> toSQLType(left.dataType)))
+      case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+        TypeUtils.checkForOrderingExpr(e2, prettyName)

Review Comment:
   Why need  check `e2` can be ordered ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133450986


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.
+      Null element is also prepended to the array. But if the array passed is NULL
+      output is NULL
+    """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+      > SELECT _FUNC_(array(1, 2, 3, null), null);
+       [null,1,2,3,null]
+      > SELECT _FUNC_(CAST(null as Array<Int>), 2);
+       NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val iPlus1 = s"$i+1"
+      val zero = "0"
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        elementType,
+        newArraySize,

Review Comment:
   do we really need this variable? can we just pass `s"$arr.numElements() + 1"`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133450131


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.

Review Comment:
   We don't need to document the type coercion behavior in each function. We have a dedicated doc to explain what "similar" type is: https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#type-promotion-and-precedence



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471792165

   I reopened the PR. @navinvishy would you mind rebasing and syncing to the latest master branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471794732

   @navinvishy like `ProblemFilters.exclude[Problem]("org.apache.spark.sql.functions.array_prepend"),` and you can run ` dev/connect-jvm-client-mima-check ` to check the result


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1138152130


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.
+      Null element is also prepended to the array. But if the array passed is NULL
+      output is NULL
+    """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+      > SELECT _FUNC_(array(1, 2, 3, null), null);
+       [null,1,2,3,null]
+      > SELECT _FUNC_(CAST(null as Array<Int>), 2);
+       NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val iPlus1 = s"$i+1"
+      val zero = "0"
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        elementType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, elementType, arr, iPlus1, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, elementType, zero, value, Some(rightGen.isNull))
+      s"""
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)

Review Comment:
   Changed this, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1140881182


##########
sql/core/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -4043,6 +4043,16 @@ object functions {
   def array_compact(column: Column): Column = withExpr {
     ArrayCompact(column.expr)
   }
+    /**

Review Comment:
   @LuciferYang thanks. I've made the updates. There were some failures that seem to be unrelated to this change. I reran the workflow and seems to have succeeded this time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042143651


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr

Review Comment:
   I think we should use while loop since it's hot path



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(

Review Comment:
   indent



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) | (NullType, _) =>
+        DataTypeMismatch(
+          errorSubClass = "NULL_TYPE",
+          messageParameters = Map("functionName" -> toSQLId(prettyName)))
+      case (l, _) if !ArrayType.acceptsType(l) =>
+        DataTypeMismatch(
+          errorSubClass = "UNEXPECTED_INPUT_TYPE",
+          messageParameters = Map(
+            "paramIndex" -> "1",
+            "requiredType" -> toSQLType(ArrayType),
+            "inputSql" -> toSQLExpr(left),
+            "inputType" -> toSQLType(left.dataType)))
+      case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+        TypeUtils.checkForOrderingExpr(e2, prettyName)
+      case _ =>
+        DataTypeMismatch(
+          errorSubClass = "ARRAY_FUNCTION_DIFF_TYPES",
+          messageParameters = Map(
+            "functionName" -> toSQLId(prettyName),
+            "dataType" -> toSQLType(ArrayType),
+            "leftType" -> toSQLType(left.dataType),
+            "rightType" -> toSQLType(right.dataType)))
+    }
+  }
+  override def inputTypes: Seq[AbstractDataType] = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) => Seq.empty

Review Comment:
   why need to deal with `NullType` here?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {

Review Comment:
   I think we can refer to `ArrayRemove#checkInputDataTypes` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041690900


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -119,21 +117,24 @@ case class Size(child: Expression, legacySizeOfNull: Boolean)
     val value = child.eval(input)
     if (value == null) {
       if (legacySizeOfNull) -1 else null
-    } else child.dataType match {
-      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
-      case _: MapType => value.asInstanceOf[MapData].numElements()
-      case other => throw QueryExecutionErrors.unsupportedOperandTypeForSizeFunctionError(other)
-    }
+    } else

Review Comment:
   Can we remove unrelated style changes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1107724565


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))
+      s"""
+         |int $pos = 0;
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |$pos = $pos + 1;
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |  $pos = $pos + 1;
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""
+            ${leftGen.code}
+            ${rightGen.code}
+            ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+            $resultCode""", isNull = FalseLiteral)
+    }
+  }
+
+  override def prettyName: String = "array_prepend"
+
+  override protected def withNewChildrenInternal(
+      newLeft: Expression, newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+
+  override def dataType: DataType = left.dataType

Review Comment:
   I guess there is a test case missing to capture this nullability case?  Something like left array is not nullable but right parameter is a null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1462275861

   @dongjoon-hyun @beliefer @LuciferYang @cloud-fan @HyukjinKwon gentle ping to please take a look. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471790173

   Seems like this broke the mima check for Spark connect. I am reverting this for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1475463815

   merged to master again 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1341514611

   > The jira number should be [SPARK-41233](https://issues.apache.org/jira/browse/SPARK-41233)
   
   Changed. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041588232


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##########
@@ -692,6 +696,7 @@ object FunctionRegistry {
     expression[Sequence]("sequence"),
     expression[ArrayRepeat]("array_repeat"),
     expression[ArrayRemove]("array_remove"),
+    expression[ArrayPrepend]("array_prepend"),

Review Comment:
   Relevant diff here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1057148245


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -1840,6 +1840,47 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
     checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null)
   }
 
+  test("SPARK-41233: ArrayPrepend") {
+    val a0 = Literal.create(Seq(1, 2, 3, 4), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", "c"), ArrayType(StringType))
+    val a2 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a3 = Literal.create(null, ArrayType(StringType))
+
+    checkEvaluation(ArrayPrepend(a0, Literal(0)), Seq(0, 1, 2, 3, 4))
+    checkEvaluation(ArrayPrepend(a1, Literal("a")), Seq("a", "a", "b", "c"))
+    checkEvaluation(ArrayPrepend(a2, Literal(1)), Seq(1))
+    checkEvaluation(ArrayPrepend(a2, Literal(null, IntegerType)), null)

Review Comment:
   sorry for the late reply, we have a update on the null handling, that is:
   1, if the array is NULL, returns NULL;
   2, if the array is NOT NULL, the element is NULL, prepend the NULL value into the array.
   
   would you mind updating this behavior? thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1119413659


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) | (NullType, _) =>
+        DataTypeMismatch(
+          errorSubClass = "NULL_TYPE",
+          messageParameters = Map("functionName" -> toSQLId(prettyName)))
+      case (l, _) if !ArrayType.acceptsType(l) =>
+        DataTypeMismatch(
+          errorSubClass = "UNEXPECTED_INPUT_TYPE",
+          messageParameters = Map(
+            "paramIndex" -> "1",
+            "requiredType" -> toSQLType(ArrayType),
+            "inputSql" -> toSQLExpr(left),
+            "inputType" -> toSQLType(left.dataType)))
+      case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+        TypeUtils.checkForOrderingExpr(e2, prettyName)
+      case _ =>
+        DataTypeMismatch(
+          errorSubClass = "ARRAY_FUNCTION_DIFF_TYPES",
+          messageParameters = Map(
+            "functionName" -> toSQLId(prettyName),
+            "dataType" -> toSQLType(ArrayType),
+            "leftType" -> toSQLType(left.dataType),
+            "rightType" -> toSQLType(right.dataType)))
+    }
+  }
+  override def inputTypes: Seq[AbstractDataType] = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) => Seq.empty

Review Comment:
   This matches what is in `ArrayAppend` now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1140918554


##########
sql/core/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -4043,6 +4043,16 @@ object functions {
   def array_compact(column: Column): Column = withExpr {
     ArrayCompact(column.expr)
   }
+    /**

Review Comment:
   All Passed now :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function
URL: https://github.com/apache/spark/pull/38947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1407297181

   > > Could we wait array_append merged then this PR could follow some convention and make a better abstract ? @zhengruifeng @LuciferYang @HyukjinKwon
   > 
   > If it is possible, I think we should wait
   
   Looks like array_append has already been merged? https://github.com/apache/spark/pull/38865


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1409581323

   @navinvishy mind fix the scala linter failure?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1045166213


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -2651,6 +2651,58 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
     )
   }
 
+  test("array prepend") {

Review Comment:
   ```scala
   - test("array prepend") {
   + test("SPARK-41233: array prepend") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1045166142


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -1840,6 +1840,47 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
     checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null)
   }
 
+  test("Array prepend") {

Review Comment:
   ```scala
   - test("Array prepend") {
   + test("SPARK-41233: ArrayPrepend") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042618665


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) | (NullType, _) =>
+        DataTypeMismatch(
+          errorSubClass = "NULL_TYPE",
+          messageParameters = Map("functionName" -> toSQLId(prettyName)))
+      case (l, _) if !ArrayType.acceptsType(l) =>
+        DataTypeMismatch(
+          errorSubClass = "UNEXPECTED_INPUT_TYPE",
+          messageParameters = Map(
+            "paramIndex" -> "1",
+            "requiredType" -> toSQLType(ArrayType),
+            "inputSql" -> toSQLExpr(left),
+            "inputType" -> toSQLType(left.dataType)))
+      case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+        TypeUtils.checkForOrderingExpr(e2, prettyName)
+      case _ =>
+        DataTypeMismatch(
+          errorSubClass = "ARRAY_FUNCTION_DIFF_TYPES",
+          messageParameters = Map(
+            "functionName" -> toSQLId(prettyName),
+            "dataType" -> toSQLType(ArrayType),
+            "leftType" -> toSQLType(left.dataType),
+            "rightType" -> toSQLType(right.dataType)))
+    }
+  }
+  override def inputTypes: Seq[AbstractDataType] = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) => Seq.empty

Review Comment:
   I thought this is good since we're declaring this to be `NullIntolerant`. I was also following what `ArrayContains` does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041587895


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1376,35 +1418,148 @@ case class ArrayContains(left: Expression, right: Expression)
            |   break;
            |}
          """.stripMargin
-      } else {
-        s"""
+        } else {
+          s"""
            |if (${ctx.genEqual(right.dataType, value, getValue)}) {
            |  ${ev.value} = true;
            |  break;
            |}
          """.stripMargin
-      }
-      s"""
+        }
+        s"""
          |for (int $i = 0; $i < $arr.numElements(); $i ++) {
          |  $loopBodyCode
          |}
        """.stripMargin
-    })
+      })
   }
 
   override def prettyName: String = "array_contains"
 
   override protected def withNewChildrenInternal(
-      newLeft: Expression, newRight: Expression): ArrayContains =
+      newLeft: Expression,
+      newRight: Expression): ArrayContains =
     copy(left = newLeft, right = newRight)
 }
+@ExpressionDescription(

Review Comment:
   Relevant diff here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041588680


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -1840,6 +1939,47 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
     checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null)
   }
 
+  test("Array prepend") {

Review Comment:
   Relevant diff here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1119610173


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,145 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =

Review Comment:
   please also document the null handling like `ArrayApend`:
   
   ```
    Type of element should be similar to type of the elements of the array.
         Null element is also appended into the array. But if the array passed, is NULL
         output is NULL
   ```
   
   examples
   ```
         > SELECT _FUNC_(array(1, 2, 3, null), null);
          [null,1,2,3,null]
         > SELECT _FUNC_(CAST(null as Array<Int>), 2);
          NULL
   ```
   



##########
python/pyspark/sql/functions.py:
##########
@@ -7619,6 +7619,36 @@ def get(col: "ColumnOrName", index: Union["ColumnOrName", int]) -> Column:
     return _invoke_function_over_columns("get", col, index)
 
 
+@try_remote_functions
+def array_prepend(col: "ColumnOrName", element: Any) -> Column:
+    """
+    Collection function: Returns an array containing element as
+    well as all elements from array. The new element is positioned
+    at the beginning of the array.
+
+    .. versionadded:: 3.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        name of column containing array
+    element :
+        element to be prepended to the array
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        an array excluding given value.
+
+    Examples
+    --------
+    >>> df = spark.createDataFrame([([2, 3, 4],), ([],)], ['data'])
+    >>> df.select(array_prepend(df.data, 1)).collect()
+    [Row(array_prepend(data, 1)=[1, 2, 3, 4]), Row(array_prepend(data, 1)=[1])]
+    """
+    return _invoke_function("array_prepend", _to_java_column(col), element)

Review Comment:
   ```suggestion
       return _invoke_function_over_columns("array_prepend", col, lit(value))
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -7619,6 +7619,36 @@ def get(col: "ColumnOrName", index: Union["ColumnOrName", int]) -> Column:
     return _invoke_function_over_columns("get", col, index)
 
 
+@try_remote_functions
+def array_prepend(col: "ColumnOrName", element: Any) -> Column:
+    """
+    Collection function: Returns an array containing element as
+    well as all elements from array. The new element is positioned
+    at the beginning of the array.
+
+    .. versionadded:: 3.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        name of column containing array
+    element :
+        element to be prepended to the array

Review Comment:
   ```suggestion
       value :
           a literal value, or a :class:`~pyspark.sql.Column` expression.
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -7619,6 +7619,36 @@ def get(col: "ColumnOrName", index: Union["ColumnOrName", int]) -> Column:
     return _invoke_function_over_columns("get", col, index)
 
 
+@try_remote_functions
+def array_prepend(col: "ColumnOrName", element: Any) -> Column:

Review Comment:
   ```suggestion
   def array_prepend(col: "ColumnOrName", value: Any) -> Column:
   ```
   
   to be consistent with `array_append`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,145 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")

Review Comment:
   I guess we only need one of var `i` and `pos`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1139918574


##########
sql/core/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -4043,6 +4043,16 @@ object functions {
   def array_compact(column: Column): Column = withExpr {
     ArrayCompact(column.expr)
   }
+    /**

Review Comment:
   nit: The indentation of comments is different from others



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1105245346


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))

Review Comment:
   I think we should not use `right.dataType` in `assignment ` and `newElemAssignment `.
   
   may use the
   ```
     @transient protected lazy val elementType: DataType =
       inputTypes.head.asInstanceOf[ArrayType].elementType
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))
+      s"""
+         |int $pos = 0;
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |$pos = $pos + 1;
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |  $pos = $pos + 1;
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""
+            ${leftGen.code}
+            ${rightGen.code}
+            ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+            $resultCode""", isNull = FalseLiteral)
+    }
+  }
+
+  override def prettyName: String = "array_prepend"
+
+  override protected def withNewChildrenInternal(
+      newLeft: Expression, newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+
+  override def dataType: DataType = left.dataType

Review Comment:
   ```suggestion
     override def dataType: DataType = if (right.nullable) left.dataType.asNullable else left.dataType
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr

Review Comment:
   would you minding make this `nullSafeEval` method and `doGenCode` method more consistent with `ArrayAppend`:
   
   ```
     @transient protected lazy val elementType: DataType =
       inputTypes.head.asInstanceOf[ArrayType].elementType
   
     override protected def nullSafeEval(arr: Any, elementData: Any): Any = {
       val arrayData = arr.asInstanceOf[ArrayData]
       val numberOfElements = arrayData.numElements() + 1
       if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
         throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
       }
       val finalData = new Array[Any](numberOfElements)
       finalData.update(0, elementData)
       arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
       new GenericArrayData(finalData)
     }
   ```
   
   I think we can make a better abstract later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1465385482

   @navinvishy  sorry, it seems late for 3.4.0.
   would you mind changing the version from `3.4.0` to `3.5.0`?
   
   I am going to merge this PR to master this week if no more comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1365031813

   @navinvishy cloud you resolve the conflict?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1365029531

   also cc @beliefer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1046742799


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -1840,6 +1840,47 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
     checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null)
   }
 
+  test("Array prepend") {

Review Comment:
   This is not a bug fix, also need to add jira number?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1407296674

   > Could we wait array_append merged then this PR could follow some convention and make a better abstract ? @zhengruifeng @LuciferYang @HyukjinKwon
   
   If it is possible, I think we should wait
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133454186


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.
+      Null element is also prepended to the array. But if the array passed is NULL
+      output is NULL
+    """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+      > SELECT _FUNC_(array(1, 2, 3, null), null);
+       [null,1,2,3,null]
+      > SELECT _FUNC_(CAST(null as Array<Int>), 2);
+       NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val iPlus1 = s"$i+1"
+      val zero = "0"
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        elementType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, elementType, arr, iPlus1, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, elementType, zero, value, Some(rightGen.isNull))
+      s"""
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471205562

   @navinvishy would you mind addressing wenchen's comments? we can merge it then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1340705645

   The jira number should be `[SPARK-41233](https://issues.apache.org/jira/browse/SPARK-41233)` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041589578


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -2645,78 +2257,103 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
         "paramIndex" -> "2",
         "inputSql" -> "\"1.0\"",
         "inputType" -> "\"DECIMAL(2,1)\"",
-        "requiredType" -> "\"INT\""
-      ),
-      queryContext = Array(ExpectedContext("", "", 0, 19, "array_repeat(a, 1.0)"))
-    )
+        "requiredType" -> "\"INT\""),
+      queryContext = Array(ExpectedContext("", "", 0, 19, "array_repeat(a, 1.0)")))
   }
 
+  test("array prepend") {

Review Comment:
   Relevant diff here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041817673


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -119,21 +117,24 @@ case class Size(child: Expression, legacySizeOfNull: Boolean)
     val value = child.eval(input)
     if (value == null) {
       if (legacySizeOfNull) -1 else null
-    } else child.dataType match {
-      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
-      case _: MapType => value.asInstanceOf[MapData].numElements()
-      case other => throw QueryExecutionErrors.unsupportedOperandTypeForSizeFunctionError(other)
-    }
+    } else

Review Comment:
   Yeah there were too many of those. I've removed them now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1449588306

   cc @dongjoon-hyun @beliefer @LuciferYang @cloud-fan @HyukjinKwon would you mind taking another look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1465390080

   > @navinvishy sorry, it seems late for 3.4.0. would you mind changing the version from `3.4.0` to `3.5.0`?
   > 
   > I am going to merge this PR to master this week if no more comments.
   
   I've updated the version. Thanks @zhengruifeng !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy closed pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy closed pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function
URL: https://github.com/apache/spark/pull/38947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1105246370


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))
+      s"""
+         |int $pos = 0;
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |$pos = $pos + 1;
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |  $pos = $pos + 1;
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""
+            ${leftGen.code}
+            ${rightGen.code}
+            ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+            $resultCode""", isNull = FalseLiteral)
+    }
+  }
+
+  override def prettyName: String = "array_prepend"
+
+  override protected def withNewChildrenInternal(
+      newLeft: Expression, newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+
+  override def dataType: DataType = left.dataType

Review Comment:
   see https://github.com/apache/spark/commit/718b6b7ed8277d5f6577367ab0d49f60f9777df7



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471632549

   merged into master, thank you @navinvishy for working on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1118278902


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))

Review Comment:
   @zhengruifeng thanks, I've made this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1041589182


##########
sql/core/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -4018,10 +4181,22 @@ object functions {
    * @group collection_funcs
    * @since 2.4.0
    */
+
   def array_remove(column: Column, element: Any): Column = withExpr {
     ArrayRemove(column.expr, lit(element).expr)
   }
 
+  /**

Review Comment:
   Relevant diff here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1340714253

   @HyukjinKwon `ArrayAppend` and `ArrayPrepend` are special cases of `ArrayInsert`. Is it better to implement each function independently or consider them as a whole?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1107726442


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr

Review Comment:
   +1 to be consistent with `ArrayAppend`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1121154959


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,145 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")

Review Comment:
   @zhengruifeng I assume you mean that we could use `i+1` here for the assignment instead of using a different var `pos`? I've done that here - please let me know if that works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1407298641

   > > > Could we wait array_append merged then this PR could follow some convention and make a better abstract ? @zhengruifeng @LuciferYang @HyukjinKwon
   > > 
   > > 
   > > If it is possible, I think we should wait
   > 
   > Looks like array_append has already been merged? #38865
   
   Also, I'm happy to try and unify `ArrayPrepend` and `ArrayAppend` (and maybe some others?) if that is useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133448219


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.

Review Comment:
   ```suggestion
         argument. Type of element should be the same to the type of the elements of the array.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133454113


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.
+      Null element is also prepended to the array. But if the array passed is NULL
+      output is NULL
+    """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+      > SELECT _FUNC_(array(1, 2, 3, null), null);
+       [null,1,2,3,null]
+      > SELECT _FUNC_(CAST(null as Array<Int>), 2);
+       NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val iPlus1 = s"$i+1"
+      val zero = "0"
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        elementType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, elementType, arr, iPlus1, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, elementType, zero, value, Some(rightGen.isNull))
+      s"""
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)

Review Comment:
   please please the same code style as
   ```
   s"""
      |...
      |...
      |""". stripMargin
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1138150738


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+      _FUNC_(array, element) - Add the element at the beginning of the array passed as first
+      argument. Type of element should be similar to type of the elements of the array.
+      Null element is also prepended to the array. But if the array passed is NULL
+      output is NULL
+    """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+       ["d","b","d","c","a"]
+      > SELECT _FUNC_(array(1, 2, 3, null), null);
+       [null,1,2,3,null]
+      > SELECT _FUNC_(CAST(null as Array<Int>), 2);
+       NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+    inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+    val arrayData = arr.asInstanceOf[ArrayData]
+    val numberOfElements = arrayData.numElements() + 1
+    if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val finalData = new Array[Any](numberOfElements)
+    finalData.update(0, elementData)
+    arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, v))
+    new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val iPlus1 = s"$i+1"
+      val zero = "0"
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        elementType,
+        newArraySize,

Review Comment:
   Yes, that wasn't really necessary. I've removed it, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1471352117

   > @navinvishy would you mind addressing wenchen's comments? we can merge it then.
   
   I've addressed them. Thanks for checking, @zhengruifeng !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1118278674


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))
+      s"""
+         |int $pos = 0;
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |$pos = $pos + 1;
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |  $pos = $pos + 1;
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""
+            ${leftGen.code}
+            ${rightGen.code}
+            ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+            $resultCode""", isNull = FalseLiteral)
+    }
+  }
+
+  override def prettyName: String = "array_prepend"
+
+  override protected def withNewChildrenInternal(
+      newLeft: Expression, newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+
+  override def dataType: DataType = left.dataType

Review Comment:
   @zhengruifeng I've made this change and added a test for null cases for both left and right params.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1340167033

   Running `./dev/scalafmt` produced a lot of formatting changes on these files. I've added comments to the portions that are relevant to the task so they are easy to find for reviewing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042152011


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr

Review Comment:
   or use `ArrayData.foreach`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042617849


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {
+    (left.dataType, right.dataType) match {
+      case (_, NullType) | (NullType, _) =>
+        DataTypeMismatch(
+          errorSubClass = "NULL_TYPE",
+          messageParameters = Map("functionName" -> toSQLId(prettyName)))
+      case (l, _) if !ArrayType.acceptsType(l) =>
+        DataTypeMismatch(
+          errorSubClass = "UNEXPECTED_INPUT_TYPE",
+          messageParameters = Map(
+            "paramIndex" -> "1",
+            "requiredType" -> toSQLType(ArrayType),
+            "inputSql" -> toSQLExpr(left),
+            "inputType" -> toSQLType(left.dataType)))
+      case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+        TypeUtils.checkForOrderingExpr(e2, prettyName)

Review Comment:
   There seems to be an ordering requirement for `ArrayData` since it has a `compare` and is backed by a `scala.collection.Seq`. But I'm ok with just returning a `TypeCheckResult.TypeCheckSuccess` here. I followed what `ArrayContains` did here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1042620296


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    nullSafeCodeGen(
+      ctx,
+      ev,
+      (arr, value) => {
+        val newArraySize = ctx.freshName("newArraySize")
+        val newArray = ctx.freshName("newArray")
+        val i = ctx.freshName("i")
+        val pos = ctx.freshName("pos")
+        val allocation = CodeGenerator.createArrayData(
+          newArray,
+          right.dataType,
+          newArraySize,
+          s" $prettyName failed.")
+        val assignment =
+          CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+        val newElemAssignment =
+          CodeGenerator.setArrayElement(newArray, right.dataType, pos, value)
+        s"""
+           |int $pos = 0;
+           |int $newArraySize = $arr.numElements() + 1;
+           |$allocation
+           |$newElemAssignment
+           |$pos = $pos + 1;
+           |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+           |  $assignment
+           |  $pos = $pos + 1;
+           |}
+           |${ev.value} = $newArray;
+           |""".stripMargin
+      })
+  }
+
+  override def prettyName: String = "array_prepend"
+  override protected def withNewChildrenInternal(
+                                                  newLeft: Expression,
+                                                  newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+  override def dataType: DataType = left.dataType
+  override def checkInputDataTypes(): TypeCheckResult = {

Review Comment:
   I did what `ArrayContains` does. Maybe we should consolidate this, since it makes sense for many of these to do the same thing? eg. `ArrayContains`, `ArrayRemove`, `ArrayPrepend`, `ArrayAppend` etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38947: [SPARK-41233][SQL] Adds an array_prepend function to catalyst

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1343716355

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1365775810

   Could we wait array_append merged then this PR could follow some convention and make a better abstract ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1089643942


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -1840,6 +1840,47 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
     checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null)
   }
 
+  test("SPARK-41233: ArrayPrepend") {
+    val a0 = Literal.create(Seq(1, 2, 3, 4), ArrayType(IntegerType))
+    val a1 = Literal.create(Seq("a", "b", "c"), ArrayType(StringType))
+    val a2 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+    val a3 = Literal.create(null, ArrayType(StringType))
+
+    checkEvaluation(ArrayPrepend(a0, Literal(0)), Seq(0, 1, 2, 3, 4))
+    checkEvaluation(ArrayPrepend(a1, Literal("a")), Seq("a", "a", "b", "c"))
+    checkEvaluation(ArrayPrepend(a2, Literal(1)), Seq(1))
+    checkEvaluation(ArrayPrepend(a2, Literal(null, IntegerType)), null)

Review Comment:
   @zhengruifeng sorry about the delay here. I've updated this to reflect the behavior you mentioned. Fixed the tests to show that as well. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1407311495

   I found `ArrayAppend ` already merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1422013069

   > @navinvishy mind fix the scala linter failure?
   
   Sorry for the delay @zhengruifeng . Fixing now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1109187704


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,149 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with ComplexTypeMergingExpression
+    with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  override def eval(input: InternalRow): Any = {
+    val value1 = left.eval(input)
+    if (value1 == null) {
+      null
+    } else {
+      val value2 = right.eval(input)
+      nullSafeEval(value1, value2)
+    }
+  }
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr
+      .asInstanceOf[ArrayData]
+      .foreach(
+        right.dataType,
+        (i, v) => {
+          newArray(pos) = v
+          pos += 1
+        })
+    new GenericArrayData(newArray)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+    val leftGen = left.genCode(ctx)
+    val rightGen = right.genCode(ctx)
+    val f = (arr: String, value: String) => {
+      val newArraySize = ctx.freshName("newArraySize")
+      val newArray = ctx.freshName("newArray")
+      val i = ctx.freshName("i")
+      val pos = ctx.freshName("pos")
+      val allocation = CodeGenerator.createArrayData(
+        newArray,
+        right.dataType,
+        newArraySize,
+        s" $prettyName failed.")
+      val assignment =
+        CodeGenerator.createArrayAssignment(newArray, right.dataType, arr, pos, i, false)
+      val newElemAssignment =
+        CodeGenerator.setArrayElement(newArray, right.dataType, pos, value, Some(rightGen.isNull))
+      s"""
+         |int $pos = 0;
+         |int $newArraySize = $arr.numElements() + 1;
+         |$allocation
+         |$newElemAssignment
+         |$pos = $pos + 1;
+         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+         |  $assignment
+         |  $pos = $pos + 1;
+         |}
+         |${ev.value} = $newArray;
+         |""".stripMargin
+    }
+    val resultCode = f(leftGen.value, rightGen.value)
+    if(nullable) {
+      val nullSafeEval = leftGen.code + rightGen.code + ctx.nullSafeExec(nullable, leftGen.isNull) {
+        s"""
+           |${ev.isNull} = false;
+           |${resultCode}
+           |""".stripMargin
+      }
+      ev.copy(code =
+        code"""
+        boolean ${ev.isNull} = true;
+        ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+        $nullSafeEval
+      """)
+    } else {
+      ev.copy(code =
+        code"""
+            ${leftGen.code}
+            ${rightGen.code}
+            ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+            $resultCode""", isNull = FalseLiteral)
+    }
+  }
+
+  override def prettyName: String = "array_prepend"
+
+  override protected def withNewChildrenInternal(
+      newLeft: Expression, newRight: Expression): ArrayPrepend =
+    copy(left = newLeft, right = newRight)
+
+  override def dataType: DataType = left.dataType

Review Comment:
   Yes, I think so



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

Posted by "navinvishy (via GitHub)" <gi...@apache.org>.
navinvishy commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1118279709


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1399,6 +1399,119 @@ case class ArrayContains(left: Expression, right: Expression)
     copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+    "_FUNC_(array, value) - Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), 4);
+       [4, 1, 2, 3]
+  """,
+  group = "array_funcs",
+  since = "3.4.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+    with ImplicitCastInputTypes
+    with NullIntolerant
+    with QueryErrorsBase {
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+    val numberOfElements = arr.asInstanceOf[ArrayData].numElements()
+    if (numberOfElements + 1 > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+      throw QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+    }
+    val newArray = new Array[Any](numberOfElements + 1)
+    newArray(0) = value
+    var pos = 1
+    arr

Review Comment:
   Thanks, I've made this consistent with `ArrayAppend`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function
URL: https://github.com/apache/spark/pull/38947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org