You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by joehalliwell <gi...@git.apache.org> on 2016/02/16 19:25:25 UTC

[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

GitHub user joehalliwell opened a pull request:

    https://github.com/apache/spark/pull/11221

    [SPARK-13242] [SQL] Generate one method per `when` clause

    This patch modifies code generation to split `when` clauses into separate methods.
    
    As the included test demonstrates, with the previous behaviour it was quite easy to breach the 64KB method size limit with even moderately complex `when` expressions.
    
    This contribution is my original work and I license the work to the project under the project's open source license.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joehalliwell/spark SPARK-13242

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11221
    
----
commit 4dbdf6e15d1116b8e1eb44822fd29ead9b7d817d
Author: Joe Halliwell <jo...@gmail.com>
Date:   2016-02-09T15:36:15Z

    Add test case

commit 2b594539de4539e7d05345595149f0eb942640e1
Author: Joe Halliwell <jo...@gmail.com>
Date:   2016-02-16T17:29:17Z

    Split when clauses into separate methods
    
    This is presumably slower than packing them all into a single method, but
    does allow many more clauses. A further improvement might be to combine
    the two approaches using some heuristic to determine how many clauses
    can be combined in a single method.

commit 57c78a7a75935e182b8e3622a3e00e5c5dc7fd74
Author: Joe Halliwell <jo...@gmail.com>
Date:   2016-02-16T17:55:58Z

    Smoketest generated code

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11221#issuecomment-185318792
  
    @joehalliwell Since the implementation will be totally different, it make sense to open a new one, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by joehalliwell <gi...@git.apache.org>.
Github user joehalliwell commented on the pull request:

    https://github.com/apache/spark/pull/11221#issuecomment-185087175
  
    Thanks for taking the time to review and comment.
    
    @rxin Yes, I wondered about batching to (say) 64KB of source code before splitting. That's the approach used in https://github.com/apache/spark/pull/7076. Without measuring, I wasn't sure what the performance impact would be -- for example I believe JIT can inline small methods such as these -- but if you think it's worth it, I'll push in that general direction in future work.
    
    @davies Ahh... I'm only really familiar with few bits of codegen I needed to write this patch. Are there tests for or examples of whole stage codgen you could point me at?
    
    In the meantime, I'll implement your suggestion of falling back to interpretation. Shall I do that on this PR, or open a new one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11221


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11221#issuecomment-185319355
  
    @joehalliwell In order to test whole stage codegen, you need to have a SQL query to run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11221#issuecomment-184870128
  
    @joehalliwell This will not work in whole stage codegen, because there is not a variable for the input row.
    
    I think an easy fix could be fallback to interpret mode if the number of branches go over 10?  you could take Cast as example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11221#issuecomment-184813282
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13242] [SQL] Generate one method per `w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11221#issuecomment-184859152
  
    cc @davies
    
    I wonder if we can do something smarter (e.g. batch some?). This is going to regress performance for very simple cases, especially when we use whole-stage codegen.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org