You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/04/18 16:09:22 UTC

[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/21102

    [SPARK-23913][SQL] Add array_intersect function 

    ## What changes were proposed in this pull request?
    
    The PR adds the SQL function `array_intersect`. The behavior of the function is based on Presto's one.
    
    This function returns returns an array of the elements in the intersection of array1 and array2.
    
    Note: The order of elements in the result is not defined.
    
    ## How was this patch tested?
    
    Added UTs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-23913

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21102
    
----
commit 548a4b804472e062e36308274d1aff8909621131
Author: Kazuaki Ishizaki <is...@...>
Date:   2018-04-18T16:01:50Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r205341581
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3805,3 +3801,339 @@ object ArrayUnion {
         new GenericArrayData(arrayBuffer)
       }
     }
    +
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike {
    +  override def dataType: DataType = ArrayType(elementType,
    +    left.dataType.asInstanceOf[ArrayType].containsNull &&
    +      right.dataType.asInstanceOf[ArrayType].containsNull)
    +
    +  var hsInt: OpenHashSet[Int] = _
    +  var hsResultInt: OpenHashSet[Int] = _
    +  var hsLong: OpenHashSet[Long] = _
    +  var hsResultLong: OpenHashSet[Long] = _
    +
    +  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = {
    +    val elem = array.getInt(idx)
    +    if (hsInt.contains(elem) && !hsResultInt.contains(elem)) {
    +      if (resultArray != null) {
    +        resultArray.setInt(pos, elem)
    +      }
    +      hsResultInt.add(elem)
    +      true
    +    } else {
    +      false
    +    }
    +  }
    +
    +  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = {
    +    val elem = array.getLong(idx)
    +    if (hsLong.contains(elem) && !hsResultLong.contains(elem)) {
    +      if (resultArray != null) {
    +        resultArray.setLong(pos, elem)
    +      }
    +      hsResultLong.add(elem)
    +      true
    +    } else {
    +      false
    +    }
    +  }
    +
    +  def evalIntLongPrimitiveType(
    +      array1: ArrayData,
    +      array2: ArrayData,
    +      resultArray: ArrayData,
    +      initFoundNullElement: Boolean,
    +      isLongType: Boolean): (Int, Boolean) = {
    +    // store elements into resultArray
    +    var i = 0
    +    var foundNullElement = initFoundNullElement
    +    if (resultArray == null) {
    +      // hsInt or hsLong is updated only once since it is not changed
    +      while (i < array1.numElements()) {
    --- End diff --
    
    `array1` and `array2` is opposite if we want to preserve the element order of the left array?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94221/testReport)** for PR 21102 at commit [`6fba1ee`](https://github.com/apache/spark/commit/6fba1ee8c3525a6f34bf5580737d067a8f0d976d).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2442/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r202782954
  
    --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala ---
    @@ -73,6 +73,46 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers {
         assert(set.contains(50))
         assert(set.contains(999))
         assert(!set.contains(10000))
    +
    +    set.add(1132)  // Cause hash contention with 999
    +    assert(set.size === 4)
    +    assert(set.contains(10))
    +    assert(set.contains(50))
    +    assert(set.contains(999))
    +    assert(set.contains(1132))
    +    assert(!set.contains(10000))
    +
    +    set.remove(1132)
    +    assert(set.size === 3)
    +    assert(set.contains(10))
    +    assert(set.contains(50))
    +    assert(set.contains(999))
    +    assert(!set.contains(1132))
    +    assert(!set.contains(10000))
    +
    +    set.remove(999)
    --- End diff --
    
    good catch, I addressed this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207765490
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3965,6 +4034,242 @@ object ArrayUnion {
       }
     }
     
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike
    +  with ComplexTypeMergingExpression {
    +  override def dataType: DataType = {
    +    dataTypeCheck
    +    ArrayType(elementType,
    +      left.dataType.asInstanceOf[ArrayType].containsNull &&
    +        right.dataType.asInstanceOf[ArrayType].containsNull)
    +  }
    +
    +  @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
    +    if (elementTypeSupportEquals) {
    +      (array1, array2) =>
    +        val hs = new OpenHashSet[Any]
    --- End diff --
    
    How about shortcutting to return an empty array when we find one of the two is empty?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203315417
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -114,6 +118,21 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
         rehashIfNeeded(k, grow, move)
       }
     
    +  /**
    +   * Remove an element from the set. If an element does not exists in the set, nothing is done.
    +   */
    +  def remove(k: T): Unit = {
    --- End diff --
    
    If we need to keep an order without duplication, we can implement this by inroducing another hashset or searching a result array when we try to add an new element.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93751/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92814/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r202621143
  
    --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala ---
    @@ -73,6 +73,46 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers {
         assert(set.contains(50))
         assert(set.contains(999))
         assert(!set.contains(10000))
    +
    +    set.add(1132)  // Cause hash contention with 999
    +    assert(set.size === 4)
    +    assert(set.contains(10))
    +    assert(set.contains(50))
    +    assert(set.contains(999))
    +    assert(set.contains(1132))
    +    assert(!set.contains(10000))
    +
    +    set.remove(1132)
    +    assert(set.size === 3)
    +    assert(set.contains(10))
    +    assert(set.contains(50))
    +    assert(set.contains(999))
    +    assert(!set.contains(1132))
    +    assert(!set.contains(10000))
    +
    +    set.remove(999)
    --- End diff --
    
    What if we remove `999` before `1132`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89517/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203311109
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -85,9 +85,13 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
       protected var _capacity = nextPowerOf2(initialCapacity)
       protected var _mask = _capacity - 1
       protected var _size = 0
    +  protected var _occupied = 0
       protected var _growThreshold = (loadFactor * _capacity).toInt
    +  def g: Int = _growThreshold
    +  def o: Int = _occupied
    --- End diff --
    
    Also, please use more descriptive and comprehensible names


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207766511
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3965,6 +4034,242 @@ object ArrayUnion {
       }
     }
     
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike
    +  with ComplexTypeMergingExpression {
    +  override def dataType: DataType = {
    +    dataTypeCheck
    +    ArrayType(elementType,
    +      left.dataType.asInstanceOf[ArrayType].containsNull &&
    +        right.dataType.asInstanceOf[ArrayType].containsNull)
    +  }
    +
    +  @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
    +    if (elementTypeSupportEquals) {
    +      (array1, array2) =>
    +        val hs = new OpenHashSet[Any]
    +        val hsResult = new OpenHashSet[Any]
    +        var foundNullElement = false
    +        var i = 0
    +        while (i < array2.numElements()) {
    +          if (array2.isNullAt(i)) {
    +            foundNullElement = true
    +          } else {
    +            val elem = array2.get(i, elementType)
    +            hs.add(elem)
    +          }
    +          i += 1
    +        }
    +        val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +        i = 0
    +        while (i < array1.numElements()) {
    +          if (array1.isNullAt(i)) {
    +            if (foundNullElement) {
    +              arrayBuffer += null
    +              foundNullElement = false
    +            }
    +          } else {
    +            val elem = array1.get(i, elementType)
    +            if (hs.contains(elem) && !hsResult.contains(elem)) {
    +              arrayBuffer += elem
    +              hsResult.add(elem)
    +            }
    +          }
    +          i += 1
    +        }
    +        new GenericArrayData(arrayBuffer)
    +    } else {
    +      (array1, array2) =>
    +        val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +        var alreadySeenNull = false
    +        var i = 0
    +        while (i < array1.numElements()) {
    +          var found = false
    +          val elem1 = array1.get(i, elementType)
    +          if (array1.isNullAt(i)) {
    +            if (!alreadySeenNull) {
    +              var j = 0
    +              while (!found && j < array2.numElements()) {
    +                found = array2.isNullAt(j)
    +                j += 1
    +              }
    +              // array2 is scanned only once for null element
    +              alreadySeenNull = true
    +            }
    +          } else {
    +            var j = 0
    +            while (!found && j < array2.numElements()) {
    +              if (!array2.isNullAt(j)) {
    +                val elem2 = array2.get(j, elementType)
    +                if (ordering.equiv(elem1, elem2)) {
    +                  // check whether elem1 is already stored in arrayBuffer
    +                  var foundArrayBuffer = false
    +                  var k = 0
    +                  while (!foundArrayBuffer && k < arrayBuffer.size) {
    +                    val va = arrayBuffer(k)
    +                    foundArrayBuffer = (va != null) && ordering.equiv(va, elem1)
    +                    k += 1
    +                  }
    +                  found = !foundArrayBuffer
    +                }
    +              }
    +              j += 1
    +            }
    +          }
    +          if (found) {
    +            arrayBuffer += elem1
    +          }
    +          i += 1
    +        }
    +        new GenericArrayData(arrayBuffer)
    +    }
    +  }
    +
    +  override def nullSafeEval(input1: Any, input2: Any): Any = {
    +    val array1 = input1.asInstanceOf[ArrayData]
    +    val array2 = input2.asInstanceOf[ArrayData]
    +
    +    evalIntersect(array1, array2)
    +  }
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val arrayData = classOf[ArrayData].getName
    +    val i = ctx.freshName("i")
    +    val value = ctx.freshName("value")
    +    val size = ctx.freshName("size")
    +    if (canUseSpecializedHashSet) {
    +      val jt = CodeGenerator.javaType(elementType)
    +      val ptName = CodeGenerator.primitiveTypeName(jt)
    +
    +      nullSafeCodeGen(ctx, ev, (array1, array2) => {
    +        val foundNullElement = ctx.freshName("foundNullElement")
    +        val nullElementIndex = ctx.freshName("nullElementIndex")
    +        val builder = ctx.freshName("builder")
    +        val openHashSet = classOf[OpenHashSet[_]].getName
    +        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
    +        val hashSet = ctx.freshName("hashSet")
    +        val hashSetResult = ctx.freshName("hashSetResult")
    +        val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
    +        val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
    +        val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
    +
    +        def withArray2NullCheck(body: String): String =
    +          if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    +            if (left.dataType.asInstanceOf[ArrayType].containsNull) {
    +              s"""
    +                 |if ($array2.isNullAt($i)) {
    +                 |  $foundNullElement = true;
    +                 |} else {
    +                 |  $body
    +                 |}
    +               """.stripMargin
    +            } else {
    +              // if array1's element is not nullable, we don't need to track the null element index.
    +              s"""
    +                 |if (!$array2.isNullAt($i)) {
    +                 |  $body
    +                 |}
    +               """.stripMargin
    +            }
    +          } else {
    +            body
    +          }
    +
    +        val writeArray2ToHashSet = withArray2NullCheck(
    +          s"""
    +             |$jt $value = ${genGetValue(array2, i)};
    +             |$hashSet.add$hsPostFix($hsValueCast$value);
    +           """.stripMargin)
    +
    +        def withArray1NullAssignment(body: String) =
    +          if (left.dataType.asInstanceOf[ArrayType].containsNull) {
    +            if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    +              s"""
    +                 |if ($array1.isNullAt($i)) {
    +                 |  if ($foundNullElement) {
    +                 |    $nullElementIndex = $size;
    +                 |    $foundNullElement = false;
    +                 |    $size++;
    +                 |    $builder.$$plus$$eq($nullValueHolder);
    +                 |  }
    +                 |} else {
    +                 |  $body
    +                 |}
    +               """.stripMargin
    +            } else {
    +              s"""
    +                 |if (!$array1.isNullAt($i)) {
    +                 |  $body
    +                 |}
    +               """.stripMargin
    +            }
    +          } else {
    +            body
    +          }
    +
    +        val processArray1 = withArray1NullAssignment(
    +          s"""
    +             |$jt $value = ${genGetValue(array1, i)};
    +             |if ($hashSet.contains($hsValueCast$value) &&
    +             |    !$hashSetResult.contains($hsValueCast$value)) {
    +             |  if (++$size > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
    +             |    break;
    +             |  }
    +             |  $hashSetResult.add$hsPostFix($hsValueCast$value);
    +             |  $builder.$$plus$$eq($value);
    +             |}
    +           """.stripMargin)
    +
    +        // Only need to track null element index when result array's element is nullable.
    +        val declareNullTrackVariables = if (dataType.asInstanceOf[ArrayType].containsNull) {
    +          s"""
    +             |boolean $foundNullElement = false;
    +             |int $nullElementIndex = -1;
    +           """.stripMargin
    +        } else {
    +          ""
    +        }
    +
    +        s"""
    +           |$openHashSet $hashSet = new $openHashSet$hsPostFix($classTag);
    +           |$openHashSet $hashSetResult = new $openHashSet$hsPostFix($classTag);
    +           |$declareNullTrackVariables
    +           |for (int $i = 0; $i < $array2.numElements(); $i++) {
    +           |  $writeArray2ToHashSet
    +           |}
    +           |$arrayBuilderClass $builder =
    +           |  ($arrayBuilderClass)$arrayBuilder.make($arrayBuilderClassTag);
    --- End diff --
    
    nit: `new $arrayBuilderClass()` should work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93002/testReport)** for PR 21102 at commit [`5492572`](https://github.com/apache/spark/commit/5492572bb83b314cd31d116d3b344ae3c4596dbd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93398/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203318288
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -85,9 +85,13 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
       protected var _capacity = nextPowerOf2(initialCapacity)
       protected var _mask = _capacity - 1
       protected var _size = 0
    +  protected var _occupied = 0
       protected var _growThreshold = (loadFactor * _capacity).toInt
    +  def g: Int = _growThreshold
    +  def o: Int = _occupied
    --- End diff --
    
    Oh, sorry for putting this. This is used only for my debugging. This should be removed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1802/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    It seems that Presto returns the result in ascending order.
    ```
    presto> SELECT array_intersect(ARRAY[5, 8, null, 1], ARRAY[8, null, 1, 5]);
          _col0      
    -----------------
     [null, 1, 5, 8] 
    (1 row)
    ```
    Shouldn't we follow the same behavior if Presto is used as a reference?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r204349890
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3805,3 +3801,339 @@ object ArrayUnion {
         new GenericArrayData(arrayBuffer)
       }
     }
    +
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    --- End diff --
    
    Just ```Examples:```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94231/testReport)** for PR 21102 at commit [`6fba1ee`](https://github.com/apache/spark/commit/6fba1ee8c3525a6f34bf5580737d067a8f0d976d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Thanks! merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Just in my opinion, I'd prefer to preserve the element order of the left array.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    cc @ueshin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207758427
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
    @@ -1647,6 +1647,60 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
         assert(result10.first.schema(0).dataType === expectedType10)
       }
     
    +  test("array_intersect functions") {
    +    val df1 = Seq((Array(1, 2, 4), Array(4, 2))).toDF("a", "b")
    +    val ans1 = Row(Seq(2, 4))
    +    checkAnswer(df1.select(array_intersect($"a", $"b")), ans1)
    +    checkAnswer(df1.selectExpr("array_intersect(a, b)"), ans1)
    +
    +    val df2 = Seq((Array[Integer](1, 2, null, 4, 5), Array[Integer](-5, 4, null, 2, -1)))
    +      .toDF("a", "b")
    +    val ans2 = Row(Seq(2, null, 4))
    +    checkAnswer(df2.select(array_intersect($"a", $"b")), ans2)
    +    checkAnswer(df2.selectExpr("array_intersect(a, b)"), ans2)
    +
    +    val df3 = Seq((Array(1L, 2L, 4L), Array(4L, 2L))).toDF("a", "b")
    +    val ans3 = Row(Seq(2L, 4L))
    +    checkAnswer(df3.select(array_intersect($"a", $"b")), ans3)
    +    checkAnswer(df3.selectExpr("array_intersect(a, b)"), ans3)
    +
    +    val df4 = Seq(
    +      (Array[java.lang.Long](1L, 2L, null, 4L, 5L), Array[java.lang.Long](-5L, 4L, null, 2L, -1L)))
    +      .toDF("a", "b")
    +    val ans4 = Row(Seq(2L, null, 4L))
    +    checkAnswer(df4.select(array_intersect($"a", $"b")), ans4)
    +    checkAnswer(df4.selectExpr("array_intersect(a, b)"), ans4)
    +
    +    val df5 = Seq((Array("c", null, "a", "f"), Array("b", "a", null, "g"))).toDF("a", "b")
    +    val ans5 = Row(Seq(null, "a"))
    +    checkAnswer(df5.select(array_intersect($"a", $"b")), ans5)
    +    checkAnswer(df5.selectExpr("array_intersect(a, b)"), ans5)
    +
    +    val df6 = Seq((null, null)).toDF("a", "b")
    +    intercept[AnalysisException] {
    --- End diff --
    
    Could you also check the error message?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1834/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1774/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r205959224
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3968,3 +3964,234 @@ object ArrayUnion {
         new GenericArrayData(arrayBuffer)
       }
     }
    +
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike {
    +  override def dataType: DataType = ArrayType(elementType,
    +    left.dataType.asInstanceOf[ArrayType].containsNull &&
    +      right.dataType.asInstanceOf[ArrayType].containsNull)
    +
    +  @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
    +    if (elementTypeSupportEquals) {
    +      (array1, array2) =>
    +        val hs = new OpenHashSet[Any]
    +        val hsResult = new OpenHashSet[Any]
    +        var foundNullElement = false
    +        var i = 0
    +        while (i < array2.numElements()) {
    +          if (array2.isNullAt(i)) {
    +            foundNullElement = true
    +          } else {
    +            val elem = array2.get(i, elementType)
    +            hs.add(elem)
    +          }
    +          i += 1
    +        }
    +        val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +        i = 0
    +        while (i < array1.numElements()) {
    +          if (array1.isNullAt(i)) {
    +            if (foundNullElement) {
    +              arrayBuffer += null
    +              foundNullElement = false
    +            }
    +          } else {
    +            val elem = array1.get(i, elementType)
    +            if (hs.contains(elem) && !hsResult.contains(elem)) {
    +              arrayBuffer += elem
    +              hsResult.add(elem)
    +            }
    +          }
    +          i += 1
    +        }
    +        new GenericArrayData(arrayBuffer)
    +    } else {
    +      (array1, array2) =>
    +        val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +        var alreadySeenNull = false
    +        var i = 0
    +        while (i < array1.numElements()) {
    +          var found = false
    +          val elem1 = array1.get(i, elementType)
    +          if (array1.isNullAt(i)) {
    +            if (!alreadySeenNull) {
    +              var j = 0
    +              while (!found && j < array2.numElements()) {
    +                found = array2.isNullAt(j)
    +                j += 1
    +              }
    +              // array2 is scanned only once for null element
    +              alreadySeenNull = true
    +            }
    +          } else {
    +            var j = 0
    +            while (!found && j < array2.numElements()) {
    +              if (!array2.isNullAt(j)) {
    +                val elem2 = array2.get(j, elementType)
    +                if (ordering.equiv(elem1, elem2)) {
    +                  // check whether elem1 is already stored in arrayBuffer
    +                  var foundArrayBuffer = false
    +                  var k = 0
    +                  while (!foundArrayBuffer && k < arrayBuffer.size) {
    +                    val va = arrayBuffer(k)
    +                    foundArrayBuffer = (va != null) && ordering.equiv(va, elem1)
    +                    k += 1
    +                  }
    +                  found = !foundArrayBuffer
    +                }
    +              }
    +              j += 1
    +            }
    +          }
    +          if (found) {
    +            arrayBuffer += elem1
    +          }
    +          i += 1
    +        }
    +        new GenericArrayData(arrayBuffer)
    +    }
    +  }
    +
    +  override def nullSafeEval(input1: Any, input2: Any): Any = {
    +    val array1 = input1.asInstanceOf[ArrayData]
    +    val array2 = input2.asInstanceOf[ArrayData]
    +
    +    evalIntersect(array1, array2)
    +  }
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val arrayData = classOf[ArrayData].getName
    +    val i = ctx.freshName("i")
    --- End diff --
    
    It would be good to refactor as a method from L4077 to L4124 since this part can be used among `union`, `except`, and `intersect`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93146/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93398/testReport)** for PR 21102 at commit [`0307c1d`](https://github.com/apache/spark/commit/0307c1ddedf2ecc8130d372a18e619fbdb4f5dc7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93157/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92323/testReport)** for PR 21102 at commit [`cd56b7d`](https://github.com/apache/spark/commit/cd56b7dcecf8228cb92ac40e028ac35d028065f5).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94213/testReport)** for PR 21102 at commit [`6fba1ee`](https://github.com/apache/spark/commit/6fba1ee8c3525a6f34bf5580737d067a8f0d976d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92814/testReport)** for PR 21102 at commit [`346274d`](https://github.com/apache/spark/commit/346274d577e7aef513477952333dfe0b431b5b2d).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r205342201
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3805,3 +3801,339 @@ object ArrayUnion {
         new GenericArrayData(arrayBuffer)
       }
     }
    +
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike {
    +  override def dataType: DataType = ArrayType(elementType,
    +    left.dataType.asInstanceOf[ArrayType].containsNull &&
    +      right.dataType.asInstanceOf[ArrayType].containsNull)
    +
    +  var hsInt: OpenHashSet[Int] = _
    +  var hsResultInt: OpenHashSet[Int] = _
    +  var hsLong: OpenHashSet[Long] = _
    +  var hsResultLong: OpenHashSet[Long] = _
    +
    +  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = {
    +    val elem = array.getInt(idx)
    +    if (hsInt.contains(elem) && !hsResultInt.contains(elem)) {
    +      if (resultArray != null) {
    +        resultArray.setInt(pos, elem)
    +      }
    +      hsResultInt.add(elem)
    +      true
    +    } else {
    +      false
    +    }
    +  }
    +
    +  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = {
    +    val elem = array.getLong(idx)
    +    if (hsLong.contains(elem) && !hsResultLong.contains(elem)) {
    +      if (resultArray != null) {
    +        resultArray.setLong(pos, elem)
    +      }
    +      hsResultLong.add(elem)
    +      true
    +    } else {
    +      false
    +    }
    +  }
    +
    +  def evalIntLongPrimitiveType(
    +      array1: ArrayData,
    +      array2: ArrayData,
    +      resultArray: ArrayData,
    +      initFoundNullElement: Boolean,
    +      isLongType: Boolean): (Int, Boolean) = {
    +    // store elements into resultArray
    +    var i = 0
    +    var foundNullElement = initFoundNullElement
    +    if (resultArray == null) {
    +      // hsInt or hsLong is updated only once since it is not changed
    --- End diff --
    
    I might miss something, but can we do the same thing for `array_except`? It would be good if we can skip traversing the right array. This is not urgent, maybe we can do it in the follow-up pr of `array_except` pr.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    I want to hear opinion of others about the order of a result.
    cc @ueshin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207967923
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3965,6 +4034,248 @@ object ArrayUnion {
       }
     }
     
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike
    +  with ComplexTypeMergingExpression {
    +  override def dataType: DataType = {
    +    dataTypeCheck
    +    ArrayType(elementType,
    +      left.dataType.asInstanceOf[ArrayType].containsNull &&
    +        right.dataType.asInstanceOf[ArrayType].containsNull)
    +  }
    +
    +  @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
    +    if (elementTypeSupportEquals) {
    +      (array1, array2) =>
    +        if (array1.numElements() != 0 && array2.numElements() != 0) {
    +          val hs = new OpenHashSet[Any]
    +          val hsResult = new OpenHashSet[Any]
    +          var foundNullElement = false
    +          var i = 0
    +          while (i < array2.numElements()) {
    +            if (array2.isNullAt(i)) {
    +              foundNullElement = true
    +            } else {
    +              val elem = array2.get(i, elementType)
    +              hs.add(elem)
    +            }
    +            i += 1
    +          }
    +          val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +          i = 0
    +          while (i < array1.numElements()) {
    +            if (array1.isNullAt(i)) {
    +              if (foundNullElement) {
    +                arrayBuffer += null
    +                foundNullElement = false
    +              }
    +            } else {
    +              val elem = array1.get(i, elementType)
    +              if (hs.contains(elem) && !hsResult.contains(elem)) {
    +                arrayBuffer += elem
    +                hsResult.add(elem)
    +              }
    +            }
    +            i += 1
    +          }
    +          new GenericArrayData(arrayBuffer)
    +        } else {
    +          new GenericArrayData(Array.emptyObjectArray)
    +        }
    +    } else {
    +      (array1, array2) =>
    +        if (array1.numElements() != 0 && array2.numElements() != 0) {
    +          val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +          var alreadySeenNull = false
    +          var i = 0
    +          while (i < array1.numElements()) {
    +            var found = false
    +            val elem1 = array1.get(i, elementType)
    +            if (array1.isNullAt(i)) {
    +              if (!alreadySeenNull) {
    +                var j = 0
    +                while (!found && j < array2.numElements()) {
    +                  found = array2.isNullAt(j)
    +                  j += 1
    +                }
    +                // array2 is scanned only once for null element
    +                alreadySeenNull = true
    +              }
    +            } else {
    +              var j = 0
    +              while (!found && j < array2.numElements()) {
    +                if (!array2.isNullAt(j)) {
    +                  val elem2 = array2.get(j, elementType)
    +                  if (ordering.equiv(elem1, elem2)) {
    +                    // check whether elem1 is already stored in arrayBuffer
    +                    var foundArrayBuffer = false
    +                    var k = 0
    +                    while (!foundArrayBuffer && k < arrayBuffer.size) {
    +                      val va = arrayBuffer(k)
    +                      foundArrayBuffer = (va != null) && ordering.equiv(va, elem1)
    +                      k += 1
    +                    }
    +                    found = !foundArrayBuffer
    +                  }
    +                }
    +                j += 1
    +              }
    +            }
    +            if (found) {
    +              arrayBuffer += elem1
    +            }
    +            i += 1
    +          }
    +          new GenericArrayData(arrayBuffer)
    +        } else {
    +          new GenericArrayData(Array.emptyObjectArray)
    +        }
    +    }
    +  }
    +
    +  override def nullSafeEval(input1: Any, input2: Any): Any = {
    +    val array1 = input1.asInstanceOf[ArrayData]
    +    val array2 = input2.asInstanceOf[ArrayData]
    +
    +    evalIntersect(array1, array2)
    +  }
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val arrayData = classOf[ArrayData].getName
    --- End diff --
    
    Thanks. I will address at #21937 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94231/testReport)** for PR 21102 at commit [`6fba1ee`](https://github.com/apache/spark/commit/6fba1ee8c3525a6f34bf5580737d067a8f0d976d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1423/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/907/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92818/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93394/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92323/testReport)** for PR 21102 at commit [`cd56b7d`](https://github.com/apache/spark/commit/cd56b7dcecf8228cb92ac40e028ac35d028065f5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/948/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94174/testReport)** for PR 21102 at commit [`ce1bfb0`](https://github.com/apache/spark/commit/ce1bfb04e774b3c18b31a33c13ce1a0cc0632419).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207723142
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3805,3 +3801,339 @@ object ArrayUnion {
         new GenericArrayData(arrayBuffer)
       }
     }
    +
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    --- End diff --
    
    This is back again?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89527/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1461/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93146/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1814/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94266/testReport)** for PR 21102 at commit [`33781b6`](https://github.com/apache/spark/commit/33781b640ed447d9a73a93b63e1834dd9360e72a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89517/testReport)** for PR 21102 at commit [`548a4b8`](https://github.com/apache/spark/commit/548a4b804472e062e36308274d1aff8909621131).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1005/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89538/testReport)** for PR 21102 at commit [`038e98c`](https://github.com/apache/spark/commit/038e98c1f3603013cb9b67562305b0863805afcf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93751/testReport)** for PR 21102 at commit [`c398291`](https://github.com/apache/spark/commit/c3982911d6195fc1bc1c63d72d4b3273f958accd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2558/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1455/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/818/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89671/testReport)** for PR 21102 at commit [`cd56b7d`](https://github.com/apache/spark/commit/cd56b7dcecf8228cb92ac40e028ac35d028065f5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1833/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    cc @ueshin @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93691/testReport)** for PR 21102 at commit [`a717ec9`](https://github.com/apache/spark/commit/a717ec9bb6d8bc0f907ff60bb568b7659936031d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  sealed class Hasher[@specialized(Long, Int, Double, Float) T] extends Serializable `
      * `  class DoubleHasher extends Hasher[Double] `
      * `  class FloatHasher extends Hasher[Float] `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92947/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207781744
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3965,6 +4034,248 @@ object ArrayUnion {
       }
     }
     
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike
    +  with ComplexTypeMergingExpression {
    +  override def dataType: DataType = {
    +    dataTypeCheck
    +    ArrayType(elementType,
    +      left.dataType.asInstanceOf[ArrayType].containsNull &&
    +        right.dataType.asInstanceOf[ArrayType].containsNull)
    +  }
    +
    +  @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
    +    if (elementTypeSupportEquals) {
    +      (array1, array2) =>
    +        if (array1.numElements() != 0 && array2.numElements() != 0) {
    +          val hs = new OpenHashSet[Any]
    +          val hsResult = new OpenHashSet[Any]
    +          var foundNullElement = false
    +          var i = 0
    +          while (i < array2.numElements()) {
    +            if (array2.isNullAt(i)) {
    +              foundNullElement = true
    +            } else {
    +              val elem = array2.get(i, elementType)
    +              hs.add(elem)
    +            }
    +            i += 1
    +          }
    +          val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +          i = 0
    +          while (i < array1.numElements()) {
    +            if (array1.isNullAt(i)) {
    +              if (foundNullElement) {
    +                arrayBuffer += null
    +                foundNullElement = false
    +              }
    +            } else {
    +              val elem = array1.get(i, elementType)
    +              if (hs.contains(elem) && !hsResult.contains(elem)) {
    +                arrayBuffer += elem
    +                hsResult.add(elem)
    +              }
    +            }
    +            i += 1
    +          }
    +          new GenericArrayData(arrayBuffer)
    +        } else {
    +          new GenericArrayData(Seq.empty)
    --- End diff --
    
    nit: `Array.empty` or `Array.emptyObjectArray`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/951/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/822/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203311755
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -85,9 +85,13 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
       protected var _capacity = nextPowerOf2(initialCapacity)
       protected var _mask = _capacity - 1
       protected var _size = 0
    +  protected var _occupied = 0
       protected var _growThreshold = (loadFactor * _capacity).toInt
    +  def g: Int = _growThreshold
    +  def o: Int = _occupied
     
       protected var _bitset = new BitSet(_capacity)
    +  protected var _bitsetDeleted: BitSet = null
    --- End diff --
    
    Why protected ? Make it private instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92813/testReport)** for PR 21102 at commit [`59ea8e2`](https://github.com/apache/spark/commit/59ea8e2f9f03f30e7c65153fa6d3c6acf1e70420).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast `
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1637/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89671/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    cc @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retedt this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/831/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89527/testReport)** for PR 21102 at commit [`2602f8e`](https://github.com/apache/spark/commit/2602f8e7c730a2128ce993416d920281d8e228ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94267/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1041/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203299689
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -114,6 +118,21 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
         rehashIfNeeded(k, grow, move)
       }
     
    +  /**
    +   * Remove an element from the set. If an element does not exists in the set, nothing is done.
    +   */
    +  def remove(k: T): Unit = {
    --- End diff --
    
    Maybe we should not add `remove` method unless we can add it by a simple way. This is used in many places and this might affect their performance. How about using other implementation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93097/testReport)** for PR 21102 at commit [`05a612b`](https://github.com/apache/spark/commit/05a612bd613669b55ae793336953e1a91a94e164).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    cc @ueshin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94264/testReport)** for PR 21102 at commit [`ce755e2`](https://github.com/apache/spark/commit/ce755e2b049ca000d6da754654b792e181e6d904).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94267/testReport)** for PR 21102 at commit [`33781b6`](https://github.com/apache/spark/commit/33781b640ed447d9a73a93b63e1834dd9360e72a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89538/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94192/testReport)** for PR 21102 at commit [`ab9aa10`](https://github.com/apache/spark/commit/ab9aa10e0d2719800348caf67901140e89df8fe4).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92813/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94174/testReport)** for PR 21102 at commit [`ce1bfb0`](https://github.com/apache/spark/commit/ce1bfb04e774b3c18b31a33c13ce1a0cc0632419).
     * This patch **fails from timeout after a configured wait of \`300m\`**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1033/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93740/testReport)** for PR 21102 at commit [`c398291`](https://github.com/apache/spark/commit/c3982911d6195fc1bc1c63d72d4b3273f958accd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94264/testReport)** for PR 21102 at commit [`ce755e2`](https://github.com/apache/spark/commit/ce755e2b049ca000d6da754654b792e181e6d904).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92833/testReport)** for PR 21102 at commit [`905c31a`](https://github.com/apache/spark/commit/905c31a5088139645ad6683bae406cc766ccb5a8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93157/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93751/testReport)** for PR 21102 at commit [`c398291`](https://github.com/apache/spark/commit/c3982911d6195fc1bc1c63d72d4b3273f958accd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92323/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1018/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2460/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94266/testReport)** for PR 21102 at commit [`33781b6`](https://github.com/apache/spark/commit/33781b640ed447d9a73a93b63e1834dd9360e72a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/904/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93691/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93155/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92998/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89527/testReport)** for PR 21102 at commit [`2602f8e`](https://github.com/apache/spark/commit/2602f8e7c730a2128ce993416d920281d8e228ee).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94267/testReport)** for PR 21102 at commit [`33781b6`](https://github.com/apache/spark/commit/33781b640ed447d9a73a93b63e1834dd9360e72a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93155/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93146/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92943/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92943/testReport)** for PR 21102 at commit [`fdc2f6c`](https://github.com/apache/spark/commit/fdc2f6c685754b4deb624daa2666aa53ab04f4ce).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92813/testReport)** for PR 21102 at commit [`59ea8e2`](https://github.com/apache/spark/commit/59ea8e2f9f03f30e7c65153fa6d3c6acf1e70420).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89517/testReport)** for PR 21102 at commit [`548a4b8`](https://github.com/apache/spark/commit/548a4b804472e062e36308274d1aff8909621131).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes `
      * `case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetUtils `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93835/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1831/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92998/testReport)** for PR 21102 at commit [`5492572`](https://github.com/apache/spark/commit/5492572bb83b314cd31d116d3b344ae3c4596dbd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93394/testReport)** for PR 21102 at commit [`0307c1d`](https://github.com/apache/spark/commit/0307c1ddedf2ecc8130d372a18e619fbdb4f5dc7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92998/testReport)** for PR 21102 at commit [`5492572`](https://github.com/apache/spark/commit/5492572bb83b314cd31d116d3b344ae3c4596dbd).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94174/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93835/testReport)** for PR 21102 at commit [`9d5cd1e`](https://github.com/apache/spark/commit/9d5cd1eb003bfe5fc5ff91a27d5a17ba476fbdf0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r205930801
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -272,7 +272,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
     
       private def nextPowerOf2(n: Int): Int = {
         if (n == 0) {
    -      1
    +      2
    --- End diff --
    
    Oh, good catch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1807/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92833/testReport)** for PR 21102 at commit [`905c31a`](https://github.com/apache/spark/commit/905c31a5088139645ad6683bae406cc766ccb5a8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92943/testReport)** for PR 21102 at commit [`fdc2f6c`](https://github.com/apache/spark/commit/fdc2f6c685754b4deb624daa2666aa53ab04f4ce).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92970/testReport)** for PR 21102 at commit [`fce9eb0`](https://github.com/apache/spark/commit/fce9eb09bf0666711dbb5584c56b2534e495dffc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r205853523
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -272,7 +272,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
     
       private def nextPowerOf2(n: Int): Int = {
         if (n == 0) {
    -      1
    +      2
    --- End diff --
    
    Good catch, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92833/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93129/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203314045
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -163,7 +187,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
        *                 to a new position (in the new data array).
        */
       def rehashIfNeeded(k: T, allocateFunc: (Int) => Unit, moveFunc: (Int, Int) => Unit) {
    -    if (_size > _growThreshold) {
    +    if (_occupied > _growThreshold) {
    --- End diff --
    
    I dont see any value in _occupied - on contrary it can cause very bad behavior if there is a lot of remove's expected.
    `_size` is a better metric to decide to rehash and grow.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93129/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93740/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92947/testReport)** for PR 21102 at commit [`7d789e2`](https://github.com/apache/spark/commit/7d789e221dd6c6d4d7176dcec87a867ec5386a60).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92947/testReport)** for PR 21102 at commit [`7d789e2`](https://github.com/apache/spark/commit/7d789e221dd6c6d4d7176dcec87a867ec5386a60).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93002/testReport)** for PR 21102 at commit [`5492572`](https://github.com/apache/spark/commit/5492572bb83b314cd31d116d3b344ae3c4596dbd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93097/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1211/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/923/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r207767226
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3965,6 +4034,242 @@ object ArrayUnion {
       }
     }
     
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike
    +  with ComplexTypeMergingExpression {
    +  override def dataType: DataType = {
    +    dataTypeCheck
    +    ArrayType(elementType,
    +      left.dataType.asInstanceOf[ArrayType].containsNull &&
    +        right.dataType.asInstanceOf[ArrayType].containsNull)
    +  }
    +
    +  @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
    +    if (elementTypeSupportEquals) {
    +      (array1, array2) =>
    +        val hs = new OpenHashSet[Any]
    +        val hsResult = new OpenHashSet[Any]
    +        var foundNullElement = false
    +        var i = 0
    +        while (i < array2.numElements()) {
    +          if (array2.isNullAt(i)) {
    +            foundNullElement = true
    +          } else {
    +            val elem = array2.get(i, elementType)
    +            hs.add(elem)
    +          }
    +          i += 1
    +        }
    +        val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +        i = 0
    +        while (i < array1.numElements()) {
    +          if (array1.isNullAt(i)) {
    +            if (foundNullElement) {
    +              arrayBuffer += null
    +              foundNullElement = false
    +            }
    +          } else {
    +            val elem = array1.get(i, elementType)
    +            if (hs.contains(elem) && !hsResult.contains(elem)) {
    +              arrayBuffer += elem
    +              hsResult.add(elem)
    +            }
    +          }
    +          i += 1
    +        }
    +        new GenericArrayData(arrayBuffer)
    +    } else {
    +      (array1, array2) =>
    +        val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
    +        var alreadySeenNull = false
    +        var i = 0
    +        while (i < array1.numElements()) {
    +          var found = false
    +          val elem1 = array1.get(i, elementType)
    +          if (array1.isNullAt(i)) {
    +            if (!alreadySeenNull) {
    +              var j = 0
    +              while (!found && j < array2.numElements()) {
    +                found = array2.isNullAt(j)
    +                j += 1
    +              }
    +              // array2 is scanned only once for null element
    +              alreadySeenNull = true
    +            }
    +          } else {
    +            var j = 0
    +            while (!found && j < array2.numElements()) {
    +              if (!array2.isNullAt(j)) {
    +                val elem2 = array2.get(j, elementType)
    +                if (ordering.equiv(elem1, elem2)) {
    +                  // check whether elem1 is already stored in arrayBuffer
    +                  var foundArrayBuffer = false
    +                  var k = 0
    +                  while (!foundArrayBuffer && k < arrayBuffer.size) {
    +                    val va = arrayBuffer(k)
    +                    foundArrayBuffer = (va != null) && ordering.equiv(va, elem1)
    +                    k += 1
    +                  }
    +                  found = !foundArrayBuffer
    +                }
    +              }
    +              j += 1
    +            }
    +          }
    +          if (found) {
    +            arrayBuffer += elem1
    +          }
    +          i += 1
    +        }
    +        new GenericArrayData(arrayBuffer)
    +    }
    +  }
    +
    +  override def nullSafeEval(input1: Any, input2: Any): Any = {
    +    val array1 = input1.asInstanceOf[ArrayData]
    +    val array2 = input2.asInstanceOf[ArrayData]
    +
    +    evalIntersect(array1, array2)
    +  }
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val arrayData = classOf[ArrayData].getName
    +    val i = ctx.freshName("i")
    +    val value = ctx.freshName("value")
    +    val size = ctx.freshName("size")
    +    if (canUseSpecializedHashSet) {
    +      val jt = CodeGenerator.javaType(elementType)
    +      val ptName = CodeGenerator.primitiveTypeName(jt)
    +
    +      nullSafeCodeGen(ctx, ev, (array1, array2) => {
    +        val foundNullElement = ctx.freshName("foundNullElement")
    +        val nullElementIndex = ctx.freshName("nullElementIndex")
    +        val builder = ctx.freshName("builder")
    +        val openHashSet = classOf[OpenHashSet[_]].getName
    +        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
    +        val hashSet = ctx.freshName("hashSet")
    +        val hashSetResult = ctx.freshName("hashSetResult")
    +        val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
    --- End diff --
    
    nit: `classOf[mutable.ArrayBuilder[_]].getName`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    I agree with @ueshin's. I wouldn't make a guarantee of returning order here in documentation yet though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203322056
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -163,7 +187,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
        *                 to a new position (in the new data array).
        */
       def rehashIfNeeded(k: T, allocateFunc: (Int) => Unit, moveFunc: (Int, Int) => Unit) {
    -    if (_size > _growThreshold) {
    +    if (_occupied > _growThreshold) {
    --- End diff --
    
    There is no explicitly entry here - it is simply unoccupied slots in an array.
    The slot is free, it can be used by some other (new) entry when insert is called.
    
    It must be trivial to see how very bad behavior can happen with actual size of set being very small - with a series of add/remove's : resulting in unending growth of the set.
    
    something like this, for example, is enough to cause set to blow to 2B entries:
    ```
    var i = 0
    while (i < Int.MaxValue) {
      set.add(1)
      set.remove(1)
      assert (0 == set.size)
      i += 1
    }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21102


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94231/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2449/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93002/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93740/testReport)** for PR 21102 at commit [`c398291`](https://github.com/apache/spark/commit/c3982911d6195fc1bc1c63d72d4b3273f958accd).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94213/testReport)** for PR 21102 at commit [`6fba1ee`](https://github.com/apache/spark/commit/6fba1ee8c3525a6f34bf5580737d067a8f0d976d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r205930794
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3805,3 +3801,339 @@ object ArrayUnion {
         new GenericArrayData(arrayBuffer)
       }
     }
    +
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    +  """,
    +  examples = """
    +    Examples:Fun
    +      > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
    +       array(1, 3)
    +  """,
    +  since = "2.4.0")
    +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike {
    +  override def dataType: DataType = ArrayType(elementType,
    +    left.dataType.asInstanceOf[ArrayType].containsNull &&
    +      right.dataType.asInstanceOf[ArrayType].containsNull)
    +
    +  var hsInt: OpenHashSet[Int] = _
    +  var hsResultInt: OpenHashSet[Int] = _
    +  var hsLong: OpenHashSet[Long] = _
    +  var hsResultLong: OpenHashSet[Long] = _
    +
    +  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = {
    +    val elem = array.getInt(idx)
    +    if (hsInt.contains(elem) && !hsResultInt.contains(elem)) {
    +      if (resultArray != null) {
    +        resultArray.setInt(pos, elem)
    +      }
    +      hsResultInt.add(elem)
    +      true
    +    } else {
    +      false
    +    }
    +  }
    +
    +  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = {
    +    val elem = array.getLong(idx)
    +    if (hsLong.contains(elem) && !hsResultLong.contains(elem)) {
    +      if (resultArray != null) {
    +        resultArray.setLong(pos, elem)
    +      }
    +      hsResultLong.add(elem)
    +      true
    +    } else {
    +      false
    +    }
    +  }
    +
    +  def evalIntLongPrimitiveType(
    +      array1: ArrayData,
    +      array2: ArrayData,
    +      resultArray: ArrayData,
    +      initFoundNullElement: Boolean,
    +      isLongType: Boolean): (Int, Boolean) = {
    +    // store elements into resultArray
    +    var i = 0
    +    var foundNullElement = initFoundNullElement
    +    if (resultArray == null) {
    +      // hsInt or hsLong is updated only once since it is not changed
    +      while (i < array1.numElements()) {
    --- End diff --
    
    You are right, fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/819/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #94221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94221/testReport)** for PR 21102 at commit [`6fba1ee`](https://github.com/apache/spark/commit/6fba1ee8c3525a6f34bf5580737d067a8f0d976d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92970/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93398/testReport)** for PR 21102 at commit [`0307c1d`](https://github.com/apache/spark/commit/0307c1ddedf2ecc8130d372a18e619fbdb4f5dc7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92970/testReport)** for PR 21102 at commit [`fce9eb0`](https://github.com/apache/spark/commit/fce9eb09bf0666711dbb5584c56b2534e495dffc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93399/testReport)** for PR 21102 at commit [`a8acfba`](https://github.com/apache/spark/commit/a8acfba09afceda30c1b3628bbb691aa71e614cb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1210/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203319710
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -163,7 +187,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
        *                 to a new position (in the new data array).
        */
       def rehashIfNeeded(k: T, allocateFunc: (Int) => Unit, moveFunc: (Int, Int) => Unit) {
    -    if (_size > _growThreshold) {
    +    if (_occupied > _growThreshold) {
    --- End diff --
    
    When 'remove' is called, '_size' is decremented. But, an entry is not released. This is  a motivation to introduce 'occupied'.
    I will try to use another implementation without 'remove' while it may introduce some overhead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93979/testReport)** for PR 21102 at commit [`c6b9a41`](https://github.com/apache/spark/commit/c6b9a417217938dacb52eaf02858e45387e511ae).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93157/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93394/testReport)** for PR 21102 at commit [`0307c1d`](https://github.com/apache/spark/commit/0307c1ddedf2ecc8130d372a18e619fbdb4f5dc7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94221/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93155/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89538/testReport)** for PR 21102 at commit [`038e98c`](https://github.com/apache/spark/commit/038e98c1f3603013cb9b67562305b0863805afcf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1786/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r203322643
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
    @@ -163,7 +187,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
        *                 to a new position (in the new data array).
        */
       def rehashIfNeeded(k: T, allocateFunc: (Int) => Unit, moveFunc: (Int, Int) => Unit) {
    -    if (_size > _growThreshold) {
    +    if (_occupied > _growThreshold) {
    --- End diff --
    
    For accuracy sake - my example snippet above will fail much earlier - due to OpenHashSet. MAX_CAPACITY. Though that is probably not the point anyway :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94266/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92818/testReport)** for PR 21102 at commit [`cf7a27c`](https://github.com/apache/spark/commit/cf7a27c00ae468ea2b76ec6bf8c75b0e57e41b33).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92814/testReport)** for PR 21102 at commit [`346274d`](https://github.com/apache/spark/commit/346274d577e7aef513477952333dfe0b431b5b2d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #89671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89671/testReport)** for PR 21102 at commit [`cd56b7d`](https://github.com/apache/spark/commit/cd56b7dcecf8228cb92ac40e028ac35d028065f5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93979/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93399/testReport)** for PR 21102 at commit [`a8acfba`](https://github.com/apache/spark/commit/a8acfba09afceda30c1b3628bbb691aa71e614cb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93979/testReport)** for PR 21102 at commit [`c6b9a41`](https://github.com/apache/spark/commit/c6b9a417217938dacb52eaf02858e45387e511ae).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93129/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94213/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #93691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93691/testReport)** for PR 21102 at commit [`a717ec9`](https://github.com/apache/spark/commit/a717ec9bb6d8bc0f907ff60bb568b7659936031d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21102#discussion_r223460909
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -3965,6 +4034,248 @@ object ArrayUnion {
       }
     }
     
    +/**
    + * Returns an array of the elements in the intersect of x and y, without duplicates
    + */
    +@ExpressionDescription(
    +  usage = """
    +  _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and
    +    array2, without duplicates.
    --- End diff --
    
    It sounds like our null handling is incorrect. NULL does not equal to NULL. 
    ```
    SELECT array_intersect(ARRAY(NULL), ARRAY(NULL));
    ```
    
    This should return an empty set. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    **[Test build #92818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92818/testReport)** for PR 21102 at commit [`cf7a27c`](https://github.com/apache/spark/commit/cf7a27c00ae468ea2b76ec6bf8c75b0e57e41b33).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1207/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1524/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94264/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94192/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1043/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org