You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2016/05/31 20:24:27 UTC

[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/13421

    [SPARK-15680][SQL] Disable comments in generated code in order to avoid perf. issues

    ## What changes were proposed in this pull request?
    
    In benchmarks involving tables with very wide and complex schemas (thousands of columns, deep nesting), I noticed that significant amounts of time (order of tens of seconds per task) were being spent generating comments during the code generation phase.
    
    The root cause of the performance problem stems from the fact that calling toString() on a complex expression can involve thousands of string concatenations, resulting in huge amounts (tens of gigabytes) of character array allocation and copying.
    
    In the long term, we can avoid this problem by passing StringBuilders down the tree and using them to accumulate output. As a short-term workaround, this patch guards comment generation behind a flag and disables comments by default (for wide tables / complex queries, these comments were being truncated prior to display and thus were not very useful).
    
    ## How was this patch tested?
    
    This was tested manually by running a Spark SQL query over an empty table with a very wide schema obtained from a real workload. Disabling comments brought the per-task time down from about 16 seconds to 600 milliseconds.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark disable-line-comments-in-codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13421
    
----
commit 0b6a190169ed0b16558c7d5fd3ba365a1b6795b9
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-05-31T20:20:08Z

    Use flag to disable comments in generated code.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    **[Test build #59689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59689/consoleFull)** for PR 13421 at commit [`db46241`](https://github.com/apache/spark/commit/db4624166bbb5e104923e77f222ce69c89c260fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    **[Test build #59683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59683/consoleFull)** for PR 13421 at commit [`db46241`](https://github.com/apache/spark/commit/db4624166bbb5e104923e77f222ce69c89c260fa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13421#discussion_r65278850
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -720,15 +721,23 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: String): String = {
    -    val name = freshName("c")
    -    val comment = if (text.contains("\n") || text.contains("\r")) {
    -      text.split("(\r\n)|\r|\n").mkString("/**\n * ", "\n * ", "\n */")
    +  def registerComment(text: => String): String = {
    +    // By default, disable comments in generated code because computing the comments themselves can
    +    // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
    +    // inputs with wide schemas. For more details on the performance issues that motivated this
    +    // flat, see SPARK-15680.
    +    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    We already have a thread-local SQLContext (SQLContext.getActive()), it could be used here.
    
    In BroadcastExchangeExec and prepare of subquery (in SparkPlan), we did not set the current SQLContext as active one, we should also fix that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59689/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    OK this isn't great, but I'm going to merge it first.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    **[Test build #59676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59676/consoleFull)** for PR 13421 at commit [`0b6a190`](https://github.com/apache/spark/commit/0b6a190169ed0b16558c7d5fd3ba365a1b6795b9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    **[Test build #59689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59689/consoleFull)** for PR 13421 at commit [`db46241`](https://github.com/apache/spark/commit/db4624166bbb5e104923e77f222ce69c89c260fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13421#discussion_r65268708
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -720,15 +721,23 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: String): String = {
    -    val name = freshName("c")
    -    val comment = if (text.contains("\n") || text.contains("\r")) {
    -      text.split("(\r\n)|\r|\n").mkString("/**\n * ", "\n * ", "\n */")
    +  def registerComment(text: => String): String = {
    +    // By default, disable comments in generated code because computing the comments themselves can
    +    // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
    +    // inputs with wide schemas. For more details on the performance issues that motivated this
    +    // flat, see SPARK-15680.
    +    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    basically the change would be larger, but i think immutable configs like this make this feature pretty much dead.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    **[Test build #59676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59676/consoleFull)** for PR 13421 at commit [`0b6a190`](https://github.com/apache/spark/commit/0b6a190169ed0b16558c7d5fd3ba365a1b6795b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13421#discussion_r65285433
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -720,15 +721,23 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: String): String = {
    -    val name = freshName("c")
    -    val comment = if (text.contains("\n") || text.contains("\r")) {
    -      text.split("(\r\n)|\r|\n").mkString("/**\n * ", "\n * ", "\n */")
    +  def registerComment(text: => String): String = {
    +    // By default, disable comments in generated code because computing the comments themselves can
    +    // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
    +    // inputs with wide schemas. For more details on the performance issues that motivated this
    +    // flat, see SPARK-15680.
    +    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    That's sad ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13421#discussion_r65278967
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -720,15 +721,23 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: String): String = {
    -    val name = freshName("c")
    -    val comment = if (text.contains("\n") || text.contains("\r")) {
    -      text.split("(\r\n)|\r|\n").mkString("/**\n * ", "\n * ", "\n */")
    +  def registerComment(text: => String): String = {
    +    // By default, disable comments in generated code because computing the comments themselves can
    +    // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
    +    // inputs with wide schemas. For more details on the performance issues that motivated this
    +    // flat, see SPARK-15680.
    +    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    Can we access `SQLContext` in the `catalyst` package, though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    **[Test build #59683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59683/consoleFull)** for PR 13421 at commit [`db46241`](https://github.com/apache/spark/commit/db4624166bbb5e104923e77f222ce69c89c260fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59683/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13421#discussion_r65273569
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -720,15 +721,23 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: String): String = {
    -    val name = freshName("c")
    -    val comment = if (text.contains("\n") || text.contains("\r")) {
    -      text.split("(\r\n)|\r|\n").mkString("/**\n * ", "\n * ", "\n */")
    +  def registerComment(text: => String): String = {
    +    // By default, disable comments in generated code because computing the comments themselves can
    +    // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
    +    // inputs with wide schemas. For more details on the performance issues that motivated this
    +    // flat, see SPARK-15680.
    +    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    The problem with using a runtime `SQLConf` (actually a `CatalystConf` (to avoid a circular dependency)) is that we'd need to thread that configuration into the implementations of the `CodeGenerator.generate` method and that method has 60+ call sites, many of which do not have a readily-accessible configuration instance.
    
    If we had some thread-local mechanism for implicitly obtaining these configurations then this would be easy, but for now I don't see a simple way to thread this configuration without changing 20+ files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    LGTM pending tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59676/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13421#discussion_r65268674
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -720,15 +721,23 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: String): String = {
    -    val name = freshName("c")
    -    val comment = if (text.contains("\n") || text.contains("\r")) {
    -      text.split("(\r\n)|\r|\n").mkString("/**\n * ", "\n * ", "\n */")
    +  def registerComment(text: => String): String = {
    +    // By default, disable comments in generated code because computing the comments themselves can
    +    // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
    +    // inputs with wide schemas. For more details on the performance issues that motivated this
    +    // flat, see SPARK-15680.
    +    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    hm this is not a runtime config -- can we use a runtime sqlconf?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/13421
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org