You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/10/28 16:59:00 UTC

[GitHub] spark pull request #19598: [SPARK-22378]

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/19598

    [SPARK-22378]

    ## What changes were proposed in this pull request?
    
    This PR eliminates redundant null check in generated code for extracting an element from complex types. Since these code generation does not take care of `nullable` in `DataType` such as `ArrayType`, the generated code always has `isNullAt(index)`.  
    This PR avoids to generate `isNullAt(index)` if `nullable` is false in `DataType`.
    
    ## How was this patch tested?
    
    Added test cases into `ComplexTypeSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22378

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19598.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19598
    
----
commit 6057309a0cc98d0b480d468107fadd1b2a44a1c0
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-28T16:53:14Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Maybe just changing the schema with always-on nullability 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    I see. It makes sense for debugability. On the other hand, the NULLable property in schema is very important for optimizations to achieve better performance.
    
    One of ideas is to add conf for debug mode to enforce insertion of null check at reading data from an iterator or column vector.
     


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83453/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83173/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    **[Test build #83448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83448/testReport)** for PR 19598 at commit [`4262983`](https://github.com/apache/spark/commit/4262983f1c984c5e54881c7e087874ca753ea97c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    **[Test build #83173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83173/testReport)** for PR 19598 at commit [`6057309`](https://github.com/apache/spark/commit/6057309a0cc98d0b480d468107fadd1b2a44a1c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    **[Test build #83448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83448/testReport)** for PR 19598 at commit [`4262983`](https://github.com/apache/spark/commit/4262983f1c984c5e54881c7e087874ca753ea97c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19598#discussion_r148936905
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala ---
    @@ -186,6 +186,14 @@ case class GetArrayStructFields(
           val values = ctx.freshName("values")
           val j = ctx.freshName("j")
           val row = ctx.freshName("row")
    +      val nullSafeEval = if (field.nullable) {
    +        s"""
    +         if ($row.isNullAt($ordinal)) {
    +           $values[$j] = null;
    +         } else
    +        """
    +      } else ""
    --- End diff --
    
    Nit:
    ```Scala
    } else {
      ""
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19598#discussion_r148931429
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala ---
    @@ -309,6 +312,8 @@ case class GetMapValue(child: Expression, key: Expression)
         val found = ctx.freshName("found")
         val key = ctx.freshName("key")
         val values = ctx.freshName("values")
    +    val nullCheck = if (!child.dataType.asInstanceOf[MapType].valueContainsNull) ""
    +      else s" || $values.isNullAt($index)"
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    ping @gatorsmile @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19598#discussion_r148931423
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala ---
    @@ -242,9 +243,11 @@ case class GetArrayItem(child: Expression, ordinal: Expression)
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
           val index = ctx.freshName("index")
    +      val nullCheck = if (!child.dataType.asInstanceOf[ArrayType].containsNull) ""
    +        else s" || $eval1.isNullAt($index)"
    --- End diff --
    
    nit:
    ```
    if {
    
    } else {
    
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    LGTM
    
    BTW, now, more and more optimization are added based on the NULLable property in schema. Normally, the database enforces null checking during inserting/updating the data, but we do not enforce it because the data is not managed by us. If the users provide wrong nullable info in the schema. This might be harder for us to debug these runtime issues.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19598#discussion_r148931334
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala ---
    @@ -186,6 +186,7 @@ case class GetArrayStructFields(
           val values = ctx.freshName("values")
           val j = ctx.freshName("j")
           val row = ctx.freshName("row")
    +      val nullCheckElement = if (!field.nullable) "false" else s"$row.isNullAt($ordinal)"
    --- End diff --
    
    how about
    ```
    val nullSafeEval = if (field.nullable) {
      s"""
        if ($row.isNullAt($ordinal)) {
          $values[$j] = null;
        } else
      """
    } else ""
    ...
    ...
    
    $nullSafeEval {
      $values[$j] = ${ctx.getValue(row, field.dataType, ordinal.toString)};
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19598#discussion_r148936939
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala ---
    @@ -309,6 +318,9 @@ case class GetMapValue(child: Expression, key: Expression)
         val found = ctx.freshName("found")
         val key = ctx.freshName("key")
         val values = ctx.freshName("values")
    +    val nullCheck = if (child.dataType.asInstanceOf[MapType].valueContainsNull) {
    +      s" || $values.isNullAt($index)"
    +    } else ""
    --- End diff --
    
    The same here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    **[Test build #83173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83173/testReport)** for PR 19598 at commit [`6057309`](https://github.com/apache/spark/commit/6057309a0cc98d0b480d468107fadd1b2a44a1c0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83448/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    ping @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    **[Test build #83453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83453/testReport)** for PR 19598 at commit [`0f798cb`](https://github.com/apache/spark/commit/0f798cb46d6982e9f643af04c4d9bc665164c699).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83447/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    @gatorsmile could you please review this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19598


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    **[Test build #83453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83453/testReport)** for PR 19598 at commit [`0f798cb`](https://github.com/apache/spark/commit/0f798cb46d6982e9f643af04c4d9bc665164c699).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19598
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org