You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by inouehrs <gi...@git.apache.org> on 2016/06/07 02:43:15 UTC

[GitHub] spark pull request #13539: [SPARK-15795] [SQL] Enable more optimizations in ...

GitHub user inouehrs opened a pull request:

    https://github.com/apache/spark/pull/13539

    [SPARK-15795] [SQL] Enable more optimizations in whole stage codegen when isNull is a compile-time constant

    ## What changes were proposed in this pull request?
    Whole stage codegen often creates `isNull` variable initialized with constant _false_, like
    `boolean mapelements_isNull = false || false;`
    
    If there is no further assignment for this `isNull` variable, whole stage codegen can do more optimizations by assuming `isNull` as a compile-time constant.
    
    In the example below, which is generated for a dataset map operation, `mapelements_isNull` defined at line 115 can be assumed by a compile-time constant (false). 
    By assuming this as a constant, the whole stage codegen eliminates `zeroOutNullBytes` at line 119 and an if-statement at line 121.
    In addition to the benefits of improved readability of generated code, eliminating `zeroOutNullBytes` will give performance advantage since it is difficult to remove for Java JIT compiler.
    
    without this patch
    ```
    /* 107 */       // CONSUME: Project [id#0L AS l#3L]
    /* 108 */       // CONSUME: DeserializeToObject l#3: bigint, obj#16: bigint
    /* 109 */       // CONSUME: MapElements <function1>, obj#17: bigint
    /* 110 */       // CONSUME: SerializeFromObject [input[0, bigint, true] AS value#18L]
    /* 111 */       // <function1>.apply
    /* 112 */       Object mapelements_obj = ((Expression) references[1]).eval(null);
    /* 113 */       scala.Function1 mapelements_value1 = (scala.Function1) mapelements_obj;
    /* 114 */
    /* 115 */       boolean mapelements_isNull = false || false;
    /* 116 */       final long mapelements_value = mapelements_isNull ? -1L : (Long) mapelements_value1.apply(range_value);
    /* 117 */
    /* 118 */       // CONSUME: WholeStageCodegen
    /* 119 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 120 */
    /* 121 */       if (mapelements_isNull) {
    /* 122 */         serializefromobject_rowWriter.setNullAt(0);
    /* 123 */       } else {
    /* 124 */         serializefromobject_rowWriter.write(0, mapelements_value);
    /* 125 */       }
    /* 126 */       append(serializefromobject_result);
    ```
    
    with this patch
    ```
    /* 107 */       // CONSUME: Project [id#0L AS l#3L]
    /* 108 */       // CONSUME: DeserializeToObject l#3: bigint, obj#9: bigint
    /* 109 */       // CONSUME: MapElements <function1>, obj#10: bigint
    /* 110 */       // CONSUME: SerializeFromObject [input[0, bigint, true] AS value#11L]
    /* 111 */       // <function1>.apply
    /* 112 */       Object mapelements_obj = ((Expression) references[1]).eval(null);
    /* 113 */       scala.Function1 mapelements_value1 = (scala.Function1) mapelements_obj;
    /* 114 */
    /* 115 */       final boolean mapelements_isNull = false || false;
    /* 116 */       final long mapelements_value = mapelements_isNull ? -1L : (Long) mapelements_value1.apply(range_value);
    /* 117 */
    /* 118 */       // CONSUME: WholeStageCodegen
    /* 119 */       serializefromobject_rowWriter.write(0, mapelements_value);
    /* 120 */       append(serializefromobject_result);
    ```
    
    
    ## How was this patch tested?
    
    by unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/inouehrs/spark dev_nullcheck_opt

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13539
    
----
commit fb6f3b5a6c5fb80adc249fbb54a8c2ed884c7dbb
Author: Hiroshi Inoue <in...@jp.ibm.com>
Date:   2016-06-06T17:31:04Z

    enable null check elimination based on generated code

commit 32f158ce5da29f9562c9aa3b4751d2241c4898ca
Author: Hiroshi Inoue <in...@jp.ibm.com>
Date:   2016-06-06T19:41:32Z

    Merge branch 'apache/master' into dev_nullcheck_opt

commit 60f582dc3e75db6ff5fe642b692f22f5d7bc7ab2
Author: Hiroshi Inoue <in...@jp.ibm.com>
Date:   2016-06-07T01:53:33Z

    make definition of isNull final

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by inouehrs <gi...@git.apache.org>.

Github user inouehrs commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    @cloud-fan @davies Thank you so much for the comments. 
    I agree that my implementation is hacky. I tried to do optimization without adding members in ExprCode or Expression. I look for a less-hacky way.
    
    (BTW, my original motivation is to eliminate zeroOutNullBytes from the inner-most loop for better optimization in Java JIT compiler. It seems that zeroOutNullBytes affected the performance by few percents for simple map operation.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    **[Test build #60412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60412/consoleFull)** for PR 13539 at commit [`186283e`](https://github.com/apache/spark/commit/186283e9321120b9a8def7a3ba51ecf5c423e049).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by inouehrs <gi...@git.apache.org>.

Github user inouehrs commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    @rxin could you please review this pull request (or please suggest someone I should ask for the review)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    This PR seems kind of hacky to me. The optimization in `GenerateUnsafeProject` which depends on the foldability of generated code is also hacky. This is something our string based codegen framework doesn't support essentially.
    
    I think we should think about it more, and come up with a holistic solution, i.e. find out all the patterns that represent a foldable value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13539
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60412/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org