You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/09/17 05:39:03 UTC

[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/22439

    [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreateArrayData method

    ## What changes were proposed in this pull request?
    
    This PR makes `GenArrayData.genCodeToCreateArrayData` method simple by using `ArrayData.createArrayData` method.
    
    Before this PR, `genCodeToCreateArrayData` method was complicated
    * Generated a temporary Java array to create `ArrayData`
    * Had separate code generation path to assign values for `GenericArrayData` and `UnsafeArrayData`
    
    After this PR, the method
    * Directly generates `GenericArrayData` or `UnsafeArrayData` without a temporary array
    * Has only code generation path to assign values
    
    ## How was this patch tested?
    
    Existing UTs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-25444

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22439
    
----
commit 24fbf742fdd8490f57d29325100036e556847c77
Author: Kazuaki Ishizaki <is...@...>
Date:   2018-09-17T05:28:00Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    **[Test build #96120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96120/testReport)** for PR 22439 at commit [`24fbf74`](https://github.com/apache/spark/commit/24fbf742fdd8490f57d29325100036e556847c77).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r218090048
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +74,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
    -   * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
    +   * @param elementsExpr concatenated set of [[Expression]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
    -      elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      elementsExpr: Seq[Expression],
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsExpr.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    -        val isNullAssignment = if (!isMapKey) {
    -          s"$arrayName[$i] = null;"
    -        } else {
    -          "throw new RuntimeException(\"Cannot use null as map key!\");"
    -        }
    -        eval.code + s"""
    -         if (${eval.isNull}) {
    -           $isNullAssignment
    -         } else {
    -           $arrayName[$i] = ${eval.value};
    -         }
    -       """
    -      }
    -      val assignmentString = ctx.splitExpressionsWithCurrentInputs(
    -        expressions = assignments,
    -        funcName = "apply",
    -        extraArguments = ("Object[]", arrayName) :: Nil)
    -
    -      (s"Object[] $arrayName = new Object[$numElements];",
    -       assignmentString,
    -       s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
    -       arrayDataName)
    -    } else {
    -      val arrayName = ctx.freshName("array")
    -      val unsafeArraySizeInBytes =
    -        UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
    -        ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
    -      val baseOffset = Platform.BYTE_ARRAY_OFFSET
    -
    -      val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsExpr.zipWithIndex.map { case (expr, i) =>
    +      val eval = expr.genCode(ctx)
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (!expr.nullable || eval.isNull == FalseLiteral) {
    --- End diff --
    
    I see, looks redudant


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    **[Test build #96144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96144/testReport)** for PR 22439 at commit [`fff3361`](https://github.com/apache/spark/commit/fff3361276519ae93c265a029290f5ba9f70945a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r218030505
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    --- End diff --
    
    Maybe we don't need `|| eval.isNull == FalseLiteral`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3146/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r217967158
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    --- End diff --
    
    How about taking an argument `elements: Seq[Expression]` instead of `elementCodes: Seq[ExprCode]` and use `element.nullable` instead of `eval.isNull` if we want to skip null-check when the nullable is false?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r218048161
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +74,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
    -   * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
    +   * @param elementsExpr concatenated set of [[Expression]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
    -      elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      elementsExpr: Seq[Expression],
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsExpr.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    -        val isNullAssignment = if (!isMapKey) {
    -          s"$arrayName[$i] = null;"
    -        } else {
    -          "throw new RuntimeException(\"Cannot use null as map key!\");"
    -        }
    -        eval.code + s"""
    -         if (${eval.isNull}) {
    -           $isNullAssignment
    -         } else {
    -           $arrayName[$i] = ${eval.value};
    -         }
    -       """
    -      }
    -      val assignmentString = ctx.splitExpressionsWithCurrentInputs(
    -        expressions = assignments,
    -        funcName = "apply",
    -        extraArguments = ("Object[]", arrayName) :: Nil)
    -
    -      (s"Object[] $arrayName = new Object[$numElements];",
    -       assignmentString,
    -       s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
    -       arrayDataName)
    -    } else {
    -      val arrayName = ctx.freshName("array")
    -      val unsafeArraySizeInBytes =
    -        UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
    -        ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
    -      val baseOffset = Platform.BYTE_ARRAY_OFFSET
    -
    -      val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsExpr.zipWithIndex.map { case (expr, i) =>
    +      val eval = expr.genCode(ctx)
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (!expr.nullable || eval.isNull == FalseLiteral) {
    --- End diff --
    
    Maybe we don't need `|| eval.isNull == FalseLiteral`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96144/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Thanks! merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96120/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    **[Test build #96133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96133/testReport)** for PR 22439 at commit [`2286b6e`](https://github.com/apache/spark/commit/2286b6e8667dfde38ef5a7a47633ac34e29b7c69).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r217965632
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    +        s"\n$setArrayElement\n"
    +      } else {
             val isNullAssignment = if (!isMapKey) {
    -          s"$arrayName[$i] = null;"
    +          s"$arrayDataName.setNullAt($i);"
             } else {
               "throw new RuntimeException(\"Cannot use null as map key!\");"
             }
    -        eval.code + s"""
    -         if (${eval.isNull}) {
    -           $isNullAssignment
    -         } else {
    -           $arrayName[$i] = ${eval.value};
    -         }
    -       """
    -      }
    -      val assignmentString = ctx.splitExpressionsWithCurrentInputs(
    -        expressions = assignments,
    -        funcName = "apply",
    -        extraArguments = ("Object[]", arrayName) :: Nil)
    -
    -      (s"Object[] $arrayName = new Object[$numElements];",
    -       assignmentString,
    -       s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
    -       arrayDataName)
    -    } else {
    -      val arrayName = ctx.freshName("array")
    -      val unsafeArraySizeInBytes =
    -        UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
    -        ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
    -      val baseOffset = Platform.BYTE_ARRAY_OFFSET
    -
    -      val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    -        val isNullAssignment = if (!isMapKey) {
    -          s"$arrayDataName.setNullAt($i);"
    +
    +        if (eval.isNull == TrueLiteral) {
    --- End diff --
    
    Do we need this case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r218032925
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    +        s"\n$setArrayElement\n"
    +      } else {
             val isNullAssignment = if (!isMapKey) {
    -          s"$arrayName[$i] = null;"
    +          s"$arrayDataName.setNullAt($i);"
             } else {
               "throw new RuntimeException(\"Cannot use null as map key!\");"
             }
    -        eval.code + s"""
    -         if (${eval.isNull}) {
    -           $isNullAssignment
    -         } else {
    -           $arrayName[$i] = ${eval.value};
    -         }
    -       """
    -      }
    -      val assignmentString = ctx.splitExpressionsWithCurrentInputs(
    -        expressions = assignments,
    -        funcName = "apply",
    -        extraArguments = ("Object[]", arrayName) :: Nil)
    -
    -      (s"Object[] $arrayName = new Object[$numElements];",
    -       assignmentString,
    -       s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
    -       arrayDataName)
    -    } else {
    -      val arrayName = ctx.freshName("array")
    -      val unsafeArraySizeInBytes =
    -        UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
    -        ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
    -      val baseOffset = Platform.BYTE_ARRAY_OFFSET
    -
    -      val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    -        val isNullAssignment = if (!isMapKey) {
    -          s"$arrayDataName.setNullAt($i);"
    +
    +        if (eval.isNull == TrueLiteral) {
    --- End diff --
    
    I don't think we need the case because there are very few expressions that set `isNull` to `TrueLiteral`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r218029860
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    --- End diff --
    
    I see. We will do `if (!element.nullable || eval.isNull == FalseLiteral) {`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    **[Test build #96120 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96120/testReport)** for PR 22439 at commit [`24fbf74`](https://github.com/apache/spark/commit/24fbf742fdd8490f57d29325100036e556847c77).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r217996706
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    +        s"\n$setArrayElement\n"
    --- End diff --
    
    It works functionally without `\n`. This is only for readability of generation code to avoid this output:
    `arrayData.setInt(0, 0);arrayData.setInt(1, 1);...`
    
    WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    **[Test build #96133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96133/testReport)** for PR 22439 at commit [`2286b6e`](https://github.com/apache/spark/commit/2286b6e8667dfde38ef5a7a47633ac34e29b7c69).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3157/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22439


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r217965854
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    +        s"\n$setArrayElement\n"
    --- End diff --
    
    nit: do we need to surround `setArrayElement` with `\n`s? Only `setArrayElement` should work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96133/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3163/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22439#discussion_r217997181
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -75,87 +75,60 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
     
     private [sql] object GenArrayData {
       /**
    -   * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class
    +   * Return Java code pieces based on DataType and array size to allocate ArrayData class
        *
        * @param ctx a [[CodegenContext]]
        * @param elementType data type of underlying array elements
        * @param elementsCode concatenated set of [[ExprCode]] for each element of an underlying array
        * @param isMapKey if true, throw an exception when the element is null
    -   * @return (code pre-assignments, concatenated assignments to each array elements,
    -   *           code post-assignments, arrayData name)
    +   * @param functionName string to include in the error message
    +   * @return (array allocation, concatenated assignments to each array elements, arrayData name)
        */
       def genCodeToCreateArrayData(
           ctx: CodegenContext,
           elementType: DataType,
           elementsCode: Seq[ExprCode],
    -      isMapKey: Boolean): (String, String, String, String) = {
    +      isMapKey: Boolean,
    +      functionName: String): (String, String, String) = {
         val arrayDataName = ctx.freshName("arrayData")
    -    val numElements = elementsCode.length
    +    val numElements = s"${elementsCode.length}L"
     
    -    if (!CodeGenerator.isPrimitiveType(elementType)) {
    -      val arrayName = ctx.freshName("arrayObject")
    -      val genericArrayClass = classOf[GenericArrayData].getName
    +    val initialization = CodeGenerator.createArrayData(
    +      arrayDataName, elementType, numElements, s" $functionName failed.")
     
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +    val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    +      val setArrayElement = CodeGenerator.setArrayElement(
    +        arrayDataName, elementType, i.toString, eval.value)
    +
    +      val assignment = if (eval.isNull == FalseLiteral) {
    +        s"\n$setArrayElement\n"
    +      } else {
             val isNullAssignment = if (!isMapKey) {
    -          s"$arrayName[$i] = null;"
    +          s"$arrayDataName.setNullAt($i);"
             } else {
               "throw new RuntimeException(\"Cannot use null as map key!\");"
             }
    -        eval.code + s"""
    -         if (${eval.isNull}) {
    -           $isNullAssignment
    -         } else {
    -           $arrayName[$i] = ${eval.value};
    -         }
    -       """
    -      }
    -      val assignmentString = ctx.splitExpressionsWithCurrentInputs(
    -        expressions = assignments,
    -        funcName = "apply",
    -        extraArguments = ("Object[]", arrayName) :: Nil)
    -
    -      (s"Object[] $arrayName = new Object[$numElements];",
    -       assignmentString,
    -       s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
    -       arrayDataName)
    -    } else {
    -      val arrayName = ctx.freshName("array")
    -      val unsafeArraySizeInBytes =
    -        UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
    -        ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
    -      val baseOffset = Platform.BYTE_ARRAY_OFFSET
    -
    -      val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType)
    -      val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    -        val isNullAssignment = if (!isMapKey) {
    -          s"$arrayDataName.setNullAt($i);"
    +
    +        if (eval.isNull == TrueLiteral) {
    --- End diff --
    
    Since I saw the following case, I added this condition to reduce the generated Java byte code size.
    ```
    if (true) {
     ...
    } else {
     ...
    }
    ```
    
    I am neutral on keeping or removing this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22439
  
    **[Test build #96144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96144/testReport)** for PR 22439 at commit [`fff3361`](https://github.com/apache/spark/commit/fff3361276519ae93c265a029290f5ba9f70945a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org