You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/08/02 09:28:48 UTC

[GitHub] spark pull request #21966: simplify codegen of array_except

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21966

    simplify codegen of array_except

    ## What changes were proposed in this pull request?
    
    simplify the codegen:
    1. only do real codegen if the type can be specialized by the hash set
    2. change the null handling. Before: track the nullElementIndex, and create a new ArrayData to insert the null in the middle. After: track the nullElementIndex, put a null placeholder in the ArrayBuilder, at the end create ArrayData from ArrayBuilder directly.
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark minor2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21966.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21966
    
----
commit 313ad294ec07d54f06db92f635b6bb9a76f17343
Author: Wenchen Fan <we...@...>
Date:   2018-08-02T09:18:21Z

    simplify codegen of array_except

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21966


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    **[Test build #94116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94116/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1743/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Thanks! merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    **[Test build #94116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94116/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: simplify codegen of array_except

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    cc @kiszk @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    **[Test build #93986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93986/testReport)** for PR 21966 at commit [`313ad29`](https://github.com/apache/spark/commit/313ad294ec07d54f06db92f635b6bb9a76f17343).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1732/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94116/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94099/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21966#discussion_r207302021
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -4077,81 +4078,84 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val arrayData = classOf[ArrayData].getName
         val i = ctx.freshName("i")
    -    val pos = ctx.freshName("pos")
         val value = ctx.freshName("value")
    -    val hsValue = ctx.freshName("hsValue")
         val size = ctx.freshName("size")
    -    if (elementTypeSupportEquals) {
    -      val ptName = CodeGenerator.primitiveTypeName(elementType)
    -      val unsafeArray = ctx.freshName("unsafeArray")
    -      val (postFix, openHashElementType, hsJavaTypeName, genHsValue,
    -           getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) =
    -        elementType match {
    -          case ByteType | ShortType | IntegerType =>
    -            ("$mcI$sp", "Int", "int", s"(int) $value",
    -              s"get$ptName($i)", s"set$ptName($pos, $value)",
    -              CodeGenerator.javaType(elementType), ptName,
    -              s"""
    -                 |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
    -                 |${ev.value} = $unsafeArray;
    -               """.stripMargin)
    -          case LongType | FloatType | DoubleType =>
    -            val signature = elementType match {
    -              case LongType => "$mcJ$sp"
    -              case FloatType => "$mcF$sp"
    -              case DoubleType => "$mcD$sp"
    -            }
    -            (signature, CodeGenerator.boxedType(elementType),
    -              CodeGenerator.javaType(elementType), value,
    -              s"get$ptName($i)", s"set$ptName($pos, $value)",
    -              CodeGenerator.javaType(elementType), ptName,
    -              s"""
    -                 |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
    -                 |${ev.value} = $unsafeArray;
    -               """.stripMargin)
    -          case _ =>
    -            val genericArrayData = classOf[GenericArrayData].getName
    -            val et = ctx.addReferenceObj("elementType", elementType)
    -            ("", "Object", "Object", value,
    -              s"get($i, $et)", s"update($pos, $value)", "Object", "Ref",
    -              s"${ev.value} = new $genericArrayData(new Object[$size]);")
    -        }
    +    val canUseSpecializedHashSet = elementType match {
    +      case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true
    +      case _ => false
    +    }
    +    if (canUseSpecializedHashSet) {
    +      val jt = CodeGenerator.javaType(elementType)
    +      val ptName = CodeGenerator.primitiveTypeName(jt)
    +
    +      def genGetValue(array: String): String =
    +        CodeGenerator.getValue(array, elementType, i)
    +
    +      val (hsPostFix, hsTypeName) = elementType match {
    +        // we cast byte/short to int when writing to the hash set.
    +        case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
    +        case LongType => ("$mcJ$sp", ptName)
    +        case FloatType => ("$mcF$sp", ptName)
    +        case DoubleType => ("$mcD$sp", ptName)
    +      }
    +
    +      // we cast byte/short to int when writing to the hash set.
    +      val hsValueCast = elementType match {
    +        case ByteType | ShortType => "(int) "
    +        case _ => ""
    +      }
     
           nullSafeCodeGen(ctx, ev, (array1, array2) => {
             val notFoundNullElement = ctx.freshName("notFoundNullElement")
             val nullElementIndex = ctx.freshName("nullElementIndex")
             val builder = ctx.freshName("builder")
    -        val array = ctx.freshName("array")
             val openHashSet = classOf[OpenHashSet[_]].getName
    -        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$openHashElementType()"
    -        val hs = ctx.freshName("hs")
    +        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
    +        val hashSet = ctx.freshName("hashSet")
             val genericArrayData = classOf[GenericArrayData].getName
             val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
    -        val arrayBuilderClass = s"$arrayBuilder$$of$primitiveTypeName"
    -        val arrayBuilderClassTag = if (primitiveTypeName != "Ref") {
    -          s"scala.reflect.ClassTag$$.MODULE$$.$primitiveTypeName()"
    -        } else {
    -          s"scala.reflect.ClassTag$$.MODULE$$.AnyRef()"
    -        }
    +        val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
    +        val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
     
    -        def withArray2NullCheck(body: String) =
    -          if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    -            s"""
    -               |if ($array2.isNullAt($i)) {
    -               |  $notFoundNullElement = false;
    -               |} else {
    -               |  $body
    -               |}
    +        def withArray2NullCheck(body: String): String =
    +          if (left.dataType.asInstanceOf[ArrayType].containsNull) {
    --- End diff --
    
    Is it better to use the following structure to make `else` clause common?
    
    ```
    if (right.dataType.asInstanceOf[ArrayType].containsNull) {
      if (left.dataType.asInstanceOf[ArrayType].containsNull) {
        ...
      } else {
        ...
      }
    } else {
      body
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    **[Test build #94099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94099/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21966#discussion_r207289175
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -4077,81 +4078,84 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val arrayData = classOf[ArrayData].getName
         val i = ctx.freshName("i")
    -    val pos = ctx.freshName("pos")
         val value = ctx.freshName("value")
    -    val hsValue = ctx.freshName("hsValue")
         val size = ctx.freshName("size")
    -    if (elementTypeSupportEquals) {
    -      val ptName = CodeGenerator.primitiveTypeName(elementType)
    -      val unsafeArray = ctx.freshName("unsafeArray")
    -      val (postFix, openHashElementType, hsJavaTypeName, genHsValue,
    -           getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) =
    -        elementType match {
    -          case ByteType | ShortType | IntegerType =>
    -            ("$mcI$sp", "Int", "int", s"(int) $value",
    -              s"get$ptName($i)", s"set$ptName($pos, $value)",
    -              CodeGenerator.javaType(elementType), ptName,
    -              s"""
    -                 |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
    -                 |${ev.value} = $unsafeArray;
    -               """.stripMargin)
    -          case LongType | FloatType | DoubleType =>
    -            val signature = elementType match {
    -              case LongType => "$mcJ$sp"
    -              case FloatType => "$mcF$sp"
    -              case DoubleType => "$mcD$sp"
    -            }
    -            (signature, CodeGenerator.boxedType(elementType),
    -              CodeGenerator.javaType(elementType), value,
    -              s"get$ptName($i)", s"set$ptName($pos, $value)",
    -              CodeGenerator.javaType(elementType), ptName,
    -              s"""
    -                 |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
    -                 |${ev.value} = $unsafeArray;
    -               """.stripMargin)
    -          case _ =>
    -            val genericArrayData = classOf[GenericArrayData].getName
    -            val et = ctx.addReferenceObj("elementType", elementType)
    -            ("", "Object", "Object", value,
    -              s"get($i, $et)", s"update($pos, $value)", "Object", "Ref",
    -              s"${ev.value} = new $genericArrayData(new Object[$size]);")
    -        }
    +    val canUseSpecializedHashSet = elementType match {
    +      case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true
    +      case _ => false
    +    }
    +    if (canUseSpecializedHashSet) {
    +      val jt = CodeGenerator.javaType(elementType)
    +      val ptName = CodeGenerator.primitiveTypeName(jt)
    +
    +      def genGetValue(array: String): String =
    +        CodeGenerator.getValue(array, elementType, i)
    +
    +      val (hsPostFix, hsTypeName) = elementType match {
    +        // we cast byte/short to int when writing to the hash set.
    +        case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
    +        case LongType => ("$mcJ$sp", ptName)
    +        case FloatType => ("$mcF$sp", ptName)
    +        case DoubleType => ("$mcD$sp", ptName)
    +      }
    +
    +      // we cast byte/short to int when writing to the hash set.
    +      val hsValueCast = elementType match {
    +        case ByteType | ShortType => "(int) "
    +        case _ => ""
    +      }
     
           nullSafeCodeGen(ctx, ev, (array1, array2) => {
             val notFoundNullElement = ctx.freshName("notFoundNullElement")
             val nullElementIndex = ctx.freshName("nullElementIndex")
             val builder = ctx.freshName("builder")
    -        val array = ctx.freshName("array")
             val openHashSet = classOf[OpenHashSet[_]].getName
    -        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$openHashElementType()"
    -        val hs = ctx.freshName("hs")
    +        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
    +        val hashSet = ctx.freshName("hashSet")
             val genericArrayData = classOf[GenericArrayData].getName
             val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
    -        val arrayBuilderClass = s"$arrayBuilder$$of$primitiveTypeName"
    -        val arrayBuilderClassTag = if (primitiveTypeName != "Ref") {
    -          s"scala.reflect.ClassTag$$.MODULE$$.$primitiveTypeName()"
    -        } else {
    -          s"scala.reflect.ClassTag$$.MODULE$$.AnyRef()"
    -        }
    +        val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
    +        val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
     
    -        def withArray2NullCheck(body: String) =
    -          if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    -            s"""
    -               |if ($array2.isNullAt($i)) {
    -               |  $notFoundNullElement = false;
    -               |} else {
    -               |  $body
    -               |}
    +        def withArray2NullCheck(body: String): String =
    +          if (left.dataType.asInstanceOf[ArrayType].containsNull) {
    +            if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    +              s"""
    +                 |if ($array2.isNullAt($i)) {
    +                 |  $notFoundNullElement = false;
    +                 |} else {
    +                 |  $body
    +                 |}
                  """.stripMargin
    +            } else {
    +              body
    +            }
               } else {
    -            body
    +            // if array1's element is not nullable, we don't need to track the null element index.
    +            if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    +              s"""
    +                 |if (!$array2.isNullAt($i)) {
    +                 |  $body
    +                 |}
    +               """.stripMargin
    +            } else {
    +              body
    +            }
               }
    -        val array2Body =
    +
    +        val writeArray2ToHashSet = withArray2NullCheck(
               s"""
    -             |$javaTypeName $value = $array2.$getter;
    -             |$hsJavaTypeName $hsValue = $genHsValue;
    -             |$hs.add$postFix($hsValue);
    -           """.stripMargin
    +             |$jt $value = ${genGetValue(array2)};
    +             |$hashSet.add$hsPostFix($hsValueCast$value);
    +           """.stripMargin)
    +
    +        // When hitting a null vale, put a null holder in the ArrayBuilder. Finally we will
    +        // convert ArrayBuilder to ArrayData and setNull on the slot with null holder.
    +        val nullValueHolder = elementType match {
    +          case ByteType => "(byte) 0"
    +          case ShortType => "(short ) 0"
    --- End diff --
    
    nit: extra space after `short`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    **[Test build #93986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93986/testReport)** for PR 21966 at commit [`313ad29`](https://github.com/apache/spark/commit/313ad294ec07d54f06db92f635b6bb9a76f17343).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    LGTM except some comments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21966: [SPARK-23915][SQL][followup] Add array_except fun...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21966#discussion_r207302241
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -4077,81 +4078,84 @@ case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val arrayData = classOf[ArrayData].getName
         val i = ctx.freshName("i")
    -    val pos = ctx.freshName("pos")
         val value = ctx.freshName("value")
    -    val hsValue = ctx.freshName("hsValue")
         val size = ctx.freshName("size")
    -    if (elementTypeSupportEquals) {
    -      val ptName = CodeGenerator.primitiveTypeName(elementType)
    -      val unsafeArray = ctx.freshName("unsafeArray")
    -      val (postFix, openHashElementType, hsJavaTypeName, genHsValue,
    -           getter, setter, javaTypeName, primitiveTypeName, arrayDataBuilder) =
    -        elementType match {
    -          case ByteType | ShortType | IntegerType =>
    -            ("$mcI$sp", "Int", "int", s"(int) $value",
    -              s"get$ptName($i)", s"set$ptName($pos, $value)",
    -              CodeGenerator.javaType(elementType), ptName,
    -              s"""
    -                 |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
    -                 |${ev.value} = $unsafeArray;
    -               """.stripMargin)
    -          case LongType | FloatType | DoubleType =>
    -            val signature = elementType match {
    -              case LongType => "$mcJ$sp"
    -              case FloatType => "$mcF$sp"
    -              case DoubleType => "$mcD$sp"
    -            }
    -            (signature, CodeGenerator.boxedType(elementType),
    -              CodeGenerator.javaType(elementType), value,
    -              s"get$ptName($i)", s"set$ptName($pos, $value)",
    -              CodeGenerator.javaType(elementType), ptName,
    -              s"""
    -                 |${ctx.createUnsafeArray(unsafeArray, size, elementType, s" $prettyName failed.")}
    -                 |${ev.value} = $unsafeArray;
    -               """.stripMargin)
    -          case _ =>
    -            val genericArrayData = classOf[GenericArrayData].getName
    -            val et = ctx.addReferenceObj("elementType", elementType)
    -            ("", "Object", "Object", value,
    -              s"get($i, $et)", s"update($pos, $value)", "Object", "Ref",
    -              s"${ev.value} = new $genericArrayData(new Object[$size]);")
    -        }
    +    val canUseSpecializedHashSet = elementType match {
    +      case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true
    +      case _ => false
    +    }
    +    if (canUseSpecializedHashSet) {
    +      val jt = CodeGenerator.javaType(elementType)
    +      val ptName = CodeGenerator.primitiveTypeName(jt)
    +
    +      def genGetValue(array: String): String =
    +        CodeGenerator.getValue(array, elementType, i)
    +
    +      val (hsPostFix, hsTypeName) = elementType match {
    +        // we cast byte/short to int when writing to the hash set.
    +        case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
    +        case LongType => ("$mcJ$sp", ptName)
    +        case FloatType => ("$mcF$sp", ptName)
    +        case DoubleType => ("$mcD$sp", ptName)
    +      }
    +
    +      // we cast byte/short to int when writing to the hash set.
    +      val hsValueCast = elementType match {
    +        case ByteType | ShortType => "(int) "
    +        case _ => ""
    +      }
     
           nullSafeCodeGen(ctx, ev, (array1, array2) => {
             val notFoundNullElement = ctx.freshName("notFoundNullElement")
             val nullElementIndex = ctx.freshName("nullElementIndex")
             val builder = ctx.freshName("builder")
    -        val array = ctx.freshName("array")
             val openHashSet = classOf[OpenHashSet[_]].getName
    -        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$openHashElementType()"
    -        val hs = ctx.freshName("hs")
    +        val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
    +        val hashSet = ctx.freshName("hashSet")
             val genericArrayData = classOf[GenericArrayData].getName
             val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
    -        val arrayBuilderClass = s"$arrayBuilder$$of$primitiveTypeName"
    -        val arrayBuilderClassTag = if (primitiveTypeName != "Ref") {
    -          s"scala.reflect.ClassTag$$.MODULE$$.$primitiveTypeName()"
    -        } else {
    -          s"scala.reflect.ClassTag$$.MODULE$$.AnyRef()"
    -        }
    +        val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
    +        val arrayBuilderClassTag = s"scala.reflect.ClassTag$$.MODULE$$.$ptName()"
     
    -        def withArray2NullCheck(body: String) =
    -          if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    -            s"""
    -               |if ($array2.isNullAt($i)) {
    -               |  $notFoundNullElement = false;
    -               |} else {
    -               |  $body
    -               |}
    +        def withArray2NullCheck(body: String): String =
    +          if (left.dataType.asInstanceOf[ArrayType].containsNull) {
    +            if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    +              s"""
    +                 |if ($array2.isNullAt($i)) {
    +                 |  $notFoundNullElement = false;
    +                 |} else {
    +                 |  $body
    +                 |}
                  """.stripMargin
    +            } else {
    +              body
    +            }
               } else {
    -            body
    +            // if array1's element is not nullable, we don't need to track the null element index.
    +            if (right.dataType.asInstanceOf[ArrayType].containsNull) {
    +              s"""
    +                 |if (!$array2.isNullAt($i)) {
    +                 |  $body
    +                 |}
    +               """.stripMargin
    +            } else {
    +              body
    +            }
               }
    -        val array2Body =
    +
    +        val writeArray2ToHashSet = withArray2NullCheck(
               s"""
    -             |$javaTypeName $value = $array2.$getter;
    -             |$hsJavaTypeName $hsValue = $genHsValue;
    -             |$hs.add$postFix($hsValue);
    -           """.stripMargin
    +             |$jt $value = ${genGetValue(array2)};
    +             |$hashSet.add$hsPostFix($hsValueCast$value);
    +           """.stripMargin)
    +
    +        // When hitting a null vale, put a null holder in the ArrayBuilder. Finally we will
    --- End diff --
    
    nit: `vale` -> `value`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1643/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93986/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21966: [SPARK-23915][SQL][followup] Add array_except function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21966
  
    **[Test build #94099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94099/testReport)** for PR 21966 at commit [`16b9949`](https://github.com/apache/spark/commit/16b9949285c7133b89b3e6624cd8f5684abd3df5).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org