You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/04/11 06:25:01 UTC

[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/21037

    [SPARK-23919][SQL] Add array_position function

    ## What changes were proposed in this pull request?
    
    The PR adds the SQL function `array_position`. The behavior of the function is based on Presto's one.
    
    The function returns the position of the first occurrence of substr in the given string using 1-based index as BigInt.
    
    ## How was this patch tested?
    
    Added UTs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-23919

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21037.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21037
    
----
commit 80fc7b0cb5c61345d05bee971b13e44727b1e108
Author: Kazuaki Ishizaki <is...@...>
Date:   2018-04-11T06:21:07Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181616266
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala ---
    @@ -105,4 +106,26 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
         checkEvaluation(ArrayContains(a3, Literal("")), null)
         checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), null)
       }
    +
    +
    --- End diff --
    
    nit: remove an extra line.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181833370
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    +  }
    +
    +  override def nullSafeEval(arr: Any, value: Any): Any = {
    +    arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
    +      if (v == value) {
    +        return (i + 1).toLong
    +      }
    +    )
    +    0L
    +  }
    +
    +  override def prettyName: String = "array_position"
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, (arr, value) => {
    +      val pos = ctx.freshName("arrayPosition")
    +      val i = ctx.freshName("i")
    +      val getValue = CodeGenerator.getValue(arr, right.dataType, i)
    +      s"""
    +         |int ${pos} = 0;
    +         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
    +         |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
    --- End diff --
    
    I totally agree with you. I added one test case for this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2203/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r180743190
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1002,6 +1002,57 @@ case class StringInstr(str: Expression, substr: Expression)
       }
     }
     
    +/**
    + * A function that returns the position of the first occurrence of substr in the given string
    + * as BigInt. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(str, substr) - Returns the (1-based) index of the first occurrence of `substr` in `str`.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_('SparkSQL', 'SQL');
    +       6
    +  """)
    --- End diff --
    
    Oh, good catch


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89314/testReport)** for PR 21037 at commit [`16ae59c`](https://github.com/apache/spark/commit/16ae59cf02da2cf0cd2e9a311b348bd82b452bff).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2195/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89411/testReport)** for PR 21037 at commit [`aac6f64`](https://github.com/apache/spark/commit/aac6f64030255b6937a4725f8ad59fcb0239f1db).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89411/testReport)** for PR 21037 at commit [`aac6f64`](https://github.com/apache/spark/commit/aac6f64030255b6937a4725f8ad59fcb0239f1db).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2254/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89255/testReport)** for PR 21037 at commit [`6a464a6`](https://github.com/apache/spark/commit/6a464a6670f48668101398d729ad134b4f2c34f8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2355/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89218/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181151969
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    --- End diff --
    
    Good question. I cannot find the behavior in Presto from the document and test cases. Thus, I leave as-is.
    Should we return `null`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181613044
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    +  }
    +
    +  override def nullSafeEval(arr: Any, value: Any): Any = {
    +    arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
    +      if (v == value) {
    +        return (i + 1).toLong
    +      }
    +    )
    +    0L
    +  }
    +
    +  override def prettyName: String = "array_position"
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, (arr, value) => {
    +      val pos = ctx.freshName("arrayPosition")
    +      val i = ctx.freshName("i")
    +      val getValue = CodeGenerator.getValue(arr, right.dataType, i)
    +      s"""
    +         |int ${pos} = 0;
    +         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
    +         |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
    +         |    ${pos} = $i + 1;
    +         |    break;
    +         |  }
    +         |}
    +         |${ev.value} = (long) $pos;
    +       """
    --- End diff --
    
    `stripMargin` is missing?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r182624162
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -505,3 +505,59 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast
     
       override def prettyName: String = "array_max"
     }
    +
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if the given value could not be found in the array. Returns null if either of
    + * the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first element in the array has
    + *       index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    --- End diff --
    
    Just wanted to note that we can use `note` here too:
    
    https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L101-L103
    
    I am mentioning this because we are adding many functions now :-).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2214/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89268/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181826436
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    --- End diff --
    
    ah, you are right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181615751
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    +  }
    +
    +  override def nullSafeEval(arr: Any, value: Any): Any = {
    +    arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
    +      if (v == value) {
    +        return (i + 1).toLong
    +      }
    +    )
    +    0L
    +  }
    +
    +  override def prettyName: String = "array_position"
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, (arr, value) => {
    +      val pos = ctx.freshName("arrayPosition")
    +      val i = ctx.freshName("i")
    +      val getValue = CodeGenerator.getValue(arr, right.dataType, i)
    +      s"""
    +         |int ${pos} = 0;
    +         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
    +         |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
    --- End diff --
    
    I guess we need to check null for each array element?
    
    E.g. for the test `Array Position` in `CollectionExpressionsSuite`, if `a0` is as follows:
    
    ```
    val a0 = Literal.create(Seq(1, null, 2, 3), ArrayType(IntegerType))
    ```
    
    `checkEvaluation(ArrayPosition(a0, Literal(0)), 0L)` will fail.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181101544
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    --- End diff --
    
    What is the behavior when `left` contains null element?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181154371
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    --- End diff --
    
    Ah, if an array in `left` contains a `null` element, as you can see [a UT](https://github.com/apache/spark/pull/21037/files#diff-d31eca9f1c4c33104dc2cb8950486910R121), it works as an usual element.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89506/testReport)** for PR 21037 at commit [`7362b1c`](https://github.com/apache/spark/commit/7362b1c699ba0126b92944b6894aa111e31ea9da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89218/testReport)** for PR 21037 at commit [`ce5df02`](https://github.com/apache/spark/commit/ce5df02327278eba4f8b3d36233e26f90acd269b).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayPosition(str: Expression, substr: Expression)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    @bersprockets You are absolutely right. I made mistake. I will completely reimplement this soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r180739909
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1002,6 +1002,57 @@ case class StringInstr(str: Expression, substr: Expression)
       }
     }
     
    +/**
    + * A function that returns the position of the first occurrence of substr in the given string
    + * as BigInt. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(str, substr) - Returns the (1-based) index of the first occurrence of `substr` in `str`.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_('SparkSQL', 'SQL');
    +       6
    +  """)
    --- End diff --
    
    `since` :)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89506/testReport)** for PR 21037 at commit [`7362b1c`](https://github.com/apache/spark/commit/7362b1c699ba0126b92944b6894aa111e31ea9da).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2303/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89188/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2241/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21037


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181613270
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    +  }
    +
    +  override def nullSafeEval(arr: Any, value: Any): Any = {
    +    arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
    +      if (v == value) {
    +        return (i + 1).toLong
    +      }
    +    )
    +    0L
    +  }
    +
    +  override def prettyName: String = "array_position"
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, (arr, value) => {
    +      val pos = ctx.freshName("arrayPosition")
    +      val i = ctx.freshName("i")
    +      val getValue = CodeGenerator.getValue(arr, right.dataType, i)
    +      s"""
    +         |int ${pos} = 0;
    --- End diff --
    
    nit: `$pos`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89506/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r182297625
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1846,6 +1846,23 @@ def array_contains(col, value):
         return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
     
     
    +@since(2.4)
    +def array_position(col, value):
    +    """
    +    Collection function: Locates the position of the first occurrence of the given value
    +    in the given array. Returns null if either of the arguments are null.
    +
    +    .. note:: The position is not zero based, but 1 based index. Returns 0 if substr
    +        could not be found in str.
    --- End diff --
    
    `Returns 0 if substr could not be found in str` -> `Returns 0 if the value could not be found in the array` or something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89200/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r182297896
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast
     
       override def prettyName: String = "array_max"
     }
    +
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    --- End diff --
    
    `Returns 0 if substr could not be found in str` -> `Returns 0 if the value could not be found in the array` or something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89255/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2196/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89238/testReport)** for PR 21037 at commit [`3d687a0`](https://github.com/apache/spark/commit/3d687a0b7168c4c4db7da193b1b7c12eb409a730).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayPosition(left: Expression, right: Expression)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181614388
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    --- End diff --
    
    So we can't know the position of `null` in the array even if the array contains `null`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181827177
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    --- End diff --
    
    According to [these UTs in Presto](https://github.com/prestodb/presto/blob/master/presto-main/src/test/java/com/facebook/presto/type/TestArrayOperators.java#L651-L656), you are right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2432/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89322/testReport)** for PR 21037 at commit [`16ae59c`](https://github.com/apache/spark/commit/16ae59cf02da2cf0cd2e9a311b348bd82b452bff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89176/testReport)** for PR 21037 at commit [`535e511`](https://github.com/apache/spark/commit/535e511a4d33d815a621b57c47901886b2ce8fc7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89268/testReport)** for PR 21037 at commit [`6a464a6`](https://github.com/apache/spark/commit/6a464a6670f48668101398d729ad134b4f2c34f8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89188/testReport)** for PR 21037 at commit [`535e511`](https://github.com/apache/spark/commit/535e511a4d33d815a621b57c47901886b2ce8fc7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r182298375
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast
     
       override def prettyName: String = "array_max"
     }
    +
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    --- End diff --
    
    nit: remove `that`?
    `The first character in str has index 1.` -> `The first element in the array has index 1.` or something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Thanks! merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181100763
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    --- End diff --
    
    Looks like this is a broken sentence?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89218/testReport)** for PR 21037 at commit [`ce5df02`](https://github.com/apache/spark/commit/ce5df02327278eba4f8b3d36233e26f90acd269b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89314/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89314/testReport)** for PR 21037 at commit [`16ae59c`](https://github.com/apache/spark/commit/16ae59cf02da2cf0cd2e9a311b348bd82b452bff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89200/testReport)** for PR 21037 at commit [`908472c`](https://github.com/apache/spark/commit/908472cdbc160e6a7c088b825afea59467e59990).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89255/testReport)** for PR 21037 at commit [`6a464a6`](https://github.com/apache/spark/commit/6a464a6670f48668101398d729ad134b4f2c34f8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2296/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181825716
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    +  }
    +
    +  override def nullSafeEval(arr: Any, value: Any): Any = {
    +    arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
    +      if (v == value) {
    +        return (i + 1).toLong
    +      }
    +    )
    +    0L
    +  }
    +
    +  override def prettyName: String = "array_position"
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    nullSafeCodeGen(ctx, ev, (arr, value) => {
    +      val pos = ctx.freshName("arrayPosition")
    +      val i = ctx.freshName("i")
    +      val getValue = CodeGenerator.getValue(arr, right.dataType, i)
    +      s"""
    +         |int ${pos} = 0;
    +         |for (int $i = 0; $i < $arr.numElements(); $i ++) {
    +         |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
    +         |    ${pos} = $i + 1;
    +         |    break;
    +         |  }
    +         |}
    +         |${ev.value} = (long) $pos;
    +       """
    --- End diff --
    
    good catch, sorry again


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89322/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89411/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89268/testReport)** for PR 21037 at commit [`6a464a6`](https://github.com/apache/spark/commit/6a464a6670f48668101398d729ad134b4f2c34f8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89176/testReport)** for PR 21037 at commit [`535e511`](https://github.com/apache/spark/commit/535e511a4d33d815a621b57c47901886b2ce8fc7).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89322/testReport)** for PR 21037 at commit [`16ae59c`](https://github.com/apache/spark/commit/16ae59cf02da2cf0cd2e9a311b348bd82b452bff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181614224
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull
    --- End diff --
    
    I guess `left.dataType.asInstanceOf[ArrayType].containsNull` is not related to the nullability of this function?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r182298195
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast
     
       override def prettyName: String = "array_max"
     }
    +
    +
    +/**
    + * Returns the position of the first occurrence of element in the given array as long.
    + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null
    + *
    + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1.
    + */
    +@ExpressionDescription(
    +  usage = """
    +    _FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(3, 2, 1), 1);
    +       3
    +  """,
    +  since = "2.4.0")
    +case class ArrayPosition(left: Expression, right: Expression)
    +  extends BinaryExpression with ImplicitCastInputTypes {
    +
    +  override def dataType: DataType = LongType
    +  override def inputTypes: Seq[AbstractDataType] =
    +    Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
    +
    +  override def nullable: Boolean = {
    +    left.nullable || right.nullable
    +  }
    --- End diff --
    
    We don't need this here because this is the same as the one in `BinaryExpression`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89238/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89188/testReport)** for PR 21037 at commit [`535e511`](https://github.com/apache/spark/commit/535e511a4d33d815a621b57c47901886b2ce8fc7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2228/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by bersprockets <gi...@git.apache.org>.
Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Is array_position a string function? The function array_contains, for example, works on arrays.
    
    Also, this new function *seems* very similar as the existing instr ("Locate the position of the first occurrence of substr column in the given string"), except the return type is BigDecimal rather than Integer.
    
    For example, I ran a query using your branch:
    
    <pre>
    scala> val df = Seq((Seq(1, 2, 3), "this is a test"), (Seq(7, 8, 9, 3), "yeah this")).toDF("a", "b")
    df: org.apache.spark.sql.DataFrame = [a: array<int>, b: string]
    
    scala> df.select(array_contains('a, 2), array_position('b, "this"), instr('b, "this")).show
    +--------------------+-----------------------+--------------+
    |array_contains(a, 2)|array_position(b, this)|instr(b, this)|
    +--------------------+-----------------------+--------------+
    |                true|                      1|             1|
    |               false|                      6|             6|
    +--------------------+-----------------------+--------------+
    
    
    scala> 
    scala> df.select(array_position('a, 3)).show
    <console>:35: error: type mismatch;
     found   : Int(3)
     required: String
           df.select(array_position('a, 3)).show
                                        ^
    
    scala> df.select(array_contains('a, 3)).show
    +--------------------+
    |array_contains(a, 3)|
    +--------------------+
    |                true|
    |                true|
    +--------------------+
    
    
    scala> 
    </pre>



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    @ueshin could you please review again?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89176/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89200/testReport)** for PR 21037 at commit [`908472c`](https://github.com/apache/spark/commit/908472cdbc160e6a7c088b825afea59467e59990).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r181283242
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression)
     
       override def prettyName: String = "array_contains"
     }
    +
    +/**
    + * A function that returns the position of the first occurrence of element in the given array
    + * as long. Returns 0 if substr could not be found in str.
    + * Returns null if either of the arguments are null and
    --- End diff --
    
    Good catch, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    **[Test build #89238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89238/testReport)** for PR 21037 at commit [`3d687a0`](https://github.com/apache/spark/commit/3d687a0b7168c4c4db7da193b1b7c12eb409a730).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2266/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21037#discussion_r182316985
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala ---
    @@ -18,6 +18,7 @@
     package org.apache.spark.sql.catalyst.expressions
     
     import org.apache.spark.SparkFunSuite
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    --- End diff --
    
    nit: unnecessary import?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org