You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/01/29 00:48:36 UTC

[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/20419

    [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comment

    ## What changes were proposed in this pull request?
    
    This PR always adds `codegenStageId` in comment of the generated class.
    
    ```
    /* 001 */ public Object generate(Object[] references) {
    /* 002 */   return new GeneratedIteratorForCodegenStage1(references);
    /* 003 */ }
    /* 004 */
    /* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 006 */   // id=1
    /* 007 */   private Object[] references;
    ...
    ```
    
    ## How was this patch tested?
    
    Existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-23032-follow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20419
    
----
commit 6ee910611e7b6311f4ec1c5afb0ce5274fcf6805
Author: Kazuaki Ishizaki <is...@...>
Date:   2018-01-29T00:35:06Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    ping @rednaxelafx


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86882/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    ping @rednaxelafx


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    I always leave the comment regardless of the `spark.sql.codegen.useIdInClassName` for a unified way to access the ID from the comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #87084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87084/testReport)** for PR 20419 at commit [`cb7a16b`](https://github.com/apache/spark/commit/cb7a16b1e1abdb7dcb45f2a18085dda0cae8c12f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    ping @rednaxelafx


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #87076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87076/testReport)** for PR 20419 at commit [`cb7a16b`](https://github.com/apache/spark/commit/cb7a16b1e1abdb7dcb45f2a18085dda0cae8c12f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #86774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86774/testReport)** for PR 20419 at commit [`4a09eac`](https://github.com/apache/spark/commit/4a09eac999ba223d379b96ac8335a2fc50ff52c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #87084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87084/testReport)** for PR 20419 at commit [`cb7a16b`](https://github.com/apache/spark/commit/cb7a16b1e1abdb7dcb45f2a18085dda0cae8c12f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/863/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #87076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87076/testReport)** for PR 20419 at commit [`cb7a16b`](https://github.com/apache/spark/commit/cb7a16b1e1abdb7dcb45f2a18085dda0cae8c12f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Then, we should add a test case to ensure it will not be broken. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86774/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    @rednaxelafx Does the following [codes](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1286-L1294) address your concern?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #86745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86745/testReport)** for PR 20419 at commit [`6ee9106`](https://github.com/apache/spark/commit/6ee910611e7b6311f4ec1c5afb0ce5274fcf6805).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    I gave it a bit more thought. Here's an alternative proposal: instead of using a "force comment" mechanism in the current form (which still gets a `ctx.freshName("c")` that the caller has no control over), why don't we provide an alternative API that also forces comment, but allows the caller to specify a (within the `CodegenContext`) unique ID for the placeholder comment?
    
    In this case, instead of using `/*wholestagecodegen_c*/` as the placeholder, which can be brittle if someone adds new calls to `ctx.registerComment()` in `WholeStageCodegenExec`, we can call `ctx.registerComment(comment, placeholderId = "wsc_codegenStageId")`, which we can be sure that it'll end up creating a placeholder comment of `/*wsc_codegenStageId*/`, stable across all whole-stage generated main classes, that would never affect the codegen cache behavior.
    
    @kiszk WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #86774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86774/testReport)** for PR 20419 at commit [`4a09eac`](https://github.com/apache/spark/commit/4a09eac999ba223d379b96ac8335a2fc50ff52c9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #86882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86882/testReport)** for PR 20419 at commit [`f07dc2d`](https://github.com/apache/spark/commit/f07dc2dffb625c44b5c33d97b56b719564c61d51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    I see. I will open a new JIRA tomorrow.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r167947791
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1226,14 +1226,21 @@ class CodegenContext {
     
       /**
        * Register a comment and return the corresponding place holder
    +   *
    +   * @param placeholderId a string for a place holder
    +   * @param force whether to force registering the comments
        */
    -  def registerComment(text: => String): String = {
    +  def registerComment(
    +      text: => String,
    +      placeholderId: String = "",
    +      force: Boolean = false): String = {
         // By default, disable comments in generated code because computing the comments themselves can
         // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
         // inputs with wide schemas. For more details on the performance issues that motivated this
         // flat, see SPARK-15680.
    -    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    -      val name = freshName("c")
    +    if (force ||
    +      SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    Now, the question is whether we can use `SQLConf.get` in this case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87076/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    @kiszk SGTM and LGTM. Let's ship it!
    
    One more question on the side: with the `forceComment = true`, are we fully sure that won't affect the equality of `CodeAndComment`?
    The whole point of this PR is to have a way to embed the ID into the non-executable code of the generated code so that it won't affect the codegen cache hit, right?
    
    Could you please post an example of either the generated code or the metrics, e.g. a `SortMergeJoin` on two identical `spark.range(3)`s, and confirm that even when the two `range()`s are codegen's into different `codegenStageId`s, with the `spark.sql.codegen.useIdInClassName` turned off, the two stages would still hit the codegen cache? Basically to verify the example I gave in https://github.com/apache/spark/pull/20224#issuecomment-357091842 still hits the codegen cache when `spark.sql.codegen.useIdInClassName=false`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r164509380
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -542,6 +542,7 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
             s"""Codegend pipeline for stage (id=$codegenStageId)
                |${this.treeString.trim}""".stripMargin)}
           final class $className extends ${classOf[BufferedRowIterator].getName} {
    +        ${ctx.registerComment(s"codegenStageId=$codegenStageId", true)}
    --- End diff --
    
    sure, done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86745/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/436/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/341/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r167775177
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1226,14 +1226,21 @@ class CodegenContext {
     
       /**
        * Register a comment and return the corresponding place holder
    +   *
    +   * @param placeholderId a string for a place holder
    --- End diff --
    
    Nit: can we rephrase this ScalaDoc a bit, maybe like:
    ```scala
    /**
     * ...
     * @param placeholderId an optionally specified identifier for the comment's placeholder. The caller should make sure this identifier is unique within the compilation unit. If this argument is not specified, a fresh identifier will be automatically created and used as the placeholder.
     * ...
     */
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r167853559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1226,14 +1226,21 @@ class CodegenContext {
     
       /**
        * Register a comment and return the corresponding place holder
    +   *
    +   * @param placeholderId a string for a place holder
    +   * @param force whether to force registering the comments
        */
    -  def registerComment(text: => String): String = {
    +  def registerComment(
    +      text: => String,
    +      placeholderId: String = "",
    +      force: Boolean = false): String = {
         // By default, disable comments in generated code because computing the comments themselves can
         // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
         // inputs with wide schemas. For more details on the performance issues that motivated this
         // flat, see SPARK-15680.
    -    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    -      val name = freshName("c")
    +    if (force ||
    +      SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    unrelated, but should this be a SQL conf?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87405/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20419


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    @kiszk This is unable to merge to 2.3. Could you open a new JIRA for this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87084/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    I did the experiment that I asked about at https://github.com/apache/spark/pull/20419#issuecomment-361359825 , and verified that under the current implementation, this PR will not affect the codegen cache hit behavior.
    
    But the way it's working is a bit brittle. The `body` field of the `CodeAndComment` returned from `WholeStageCodegenExec.doCodeGen()` looks like the following, when `spark.sql.codegen.comments=false && spark.sql.codegen.useIdInClassName=false`:
    ```java
    /*wholestagecodegen_c*/
    final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    ```
    Notice the `/*wholestagecodegen_c*/` placeholder comment. It's the artifact produced by the `ctx.registerComment()` call, where `wholestagecodegen_` is a prefix automatically derived from the calling class, and `c` is from a `ctx.freshName("c")`, subject to an integer suffix when names collide.
    
    The way it's used right now, because from `WholeStageCodegenExec` we're only making very few calls to `ctx.registerComment()` and all of them happen in a deterministic order, that's how this identifier embedded in the placeholder comment would be the same no matter what codegen stage it is.
    
    But let's assume if someone doesn't know about this, and then accidentally added code in `WholeStageCodegenExec` that conditionally calls `ctx.registerComment()`, then this placeholder comment is NOT guaranteed to be the same, and that will affect the codegen cache hit behavior.
    
    I'd say functional wise this PR is good to go; maybe we want to add a comment somewhere in `WholeStageCodegenExec` to note the subtlety, or maybe think about an alternative sometime in the future.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r164951129
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1227,12 +1227,13 @@ class CodegenContext {
       /**
        * Register a comment and return the corresponding place holder
        */
    -  def registerComment(text: => String): String = {
    +  def registerComment(text: => String, forceComment: Boolean = false): String = {
    --- End diff --
    
    Nit: rename it to `force`?
    
    Also adding the text?
    ```
    @param force whether to force registering the comments
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    @gatorsmile Not directly. The `CodeAndComment` case class is just a "container", it doesn't handle what gets into the `body` field. When we force embed a comment, it'll leave a comment as a placeholder in the generated code, which is in the `body` (the actual comment contents are in the `comment` map). I was just curious if any reviews knows off the top of their heads whether or not this placeholder may affect equality.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r167958486
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1226,14 +1226,21 @@ class CodegenContext {
     
       /**
        * Register a comment and return the corresponding place holder
    +   *
    +   * @param placeholderId a string for a place holder
    +   * @param force whether to force registering the comments
        */
    -  def registerComment(text: => String): String = {
    +  def registerComment(
    +      text: => String,
    +      placeholderId: String = "",
    +      force: Boolean = false): String = {
         // By default, disable comments in generated code because computing the comments themselves can
         // be extremely expensive in certain cases, such as deeply-nested expressions which operate over
         // inputs with wide schemas. For more details on the performance issues that motivated this
         // flat, see SPARK-15680.
    -    if (SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    -      val name = freshName("c")
    +    if (force ||
    +      SparkEnv.get != null && SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
    --- End diff --
    
    IIUC, the answer seems to be No. See [this discussion](https://github.com/apache/spark/pull/19449#discussion_r143291724).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/598/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    LGTM, and +1 on @viirya 's idea. I like it better for the comment to be on top of the class declaration instead of inside it; but I'm okay either way if others have strong opinion otherwise. As long as the comment line is right next to the class declaration line, that's good enough for easy pattern matching.
    
    BTW, to save a few characters from the generated code, maybe we don't need to generate this comment when the `spark.sql.codegen.useIdInClassName` is turned on, since the `codegenStageId` is in the class name already.
    But for the ease of having a unified way to access the ID from the comment, I like having this comment always there. I guess @kiszk and @viirya are on the same side on this one for always having the comment, right? In that case I'd agree and let's get this in :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #87405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87405/testReport)** for PR 20419 at commit [`0ad1e72`](https://github.com/apache/spark/commit/0ad1e72c5ec874d688e08b0cfa0a297867be5275).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/316/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #87405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87405/testReport)** for PR 20419 at commit [`0ad1e72`](https://github.com/apache/spark/commit/0ad1e72c5ec874d688e08b0cfa0a297867be5275).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    @rednaxelafx I understand your concern when `ctx.registerComment()` is conditionally called.
    
    If we call `ctx.registerComment()` with the specific identified multiple times, how do we handle? (e.g. `Expression.genCode()` may be called multiple times from different contexts.
    
    I am sorry that I will slowly response next few days due to a short vacation.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20419#discussion_r164321458
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -542,6 +542,7 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
             s"""Codegend pipeline for stage (id=$codegenStageId)
                |${this.treeString.trim}""".stripMargin)}
           final class $className extends ${classOf[BufferedRowIterator].getName} {
    +        ${ctx.registerComment(s"codegenStageId=$codegenStageId", true)}
    --- End diff --
    
    Instead of putting this comment inside the class, how about let it on top of it? Like:
    
    ```/* 001 */ public Object generate(Object[] references) {
    /* 002 */   return new GeneratedIteratorForCodegenStage1(references);
    /* 003 */ }
    /* 004 */
    /* 005 */ // codegenStageId=1
    /* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 007 */   private Object[] references;
    ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #86745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86745/testReport)** for PR 20419 at commit [`6ee9106`](https://github.com/apache/spark/commit/6ee910611e7b6311f4ec1c5afb0ce5274fcf6805).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    **[Test build #86882 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86882/testReport)** for PR 20419 at commit [`f07dc2d`](https://github.com/apache/spark/commit/f07dc2dffb625c44b5c33d97b56b719564c61d51).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/605/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    @kiszk For this specific kind of usage, I don't think using a hardcoded stable ID will be a problem.
    The comment we're talking about is the kind the can only appear once in a single whole-stage codegen unit -- there's only one `GeneratedIterator` class with the `processNext()` method as the main entry point. Providing such a mechanism allows codegen developers to mark important information without risking to affect codegen cache behavior.
    
    For safety, we can add a runtime check. Let's assume this new method is called `ctx.registerCommentWithId()`, then inside this method we can implement something very similar to `ctx.freshName()`, but instead of producing a new name with a potential integer suffix appended, we can either log a warning message or throw an exception. That way the developer would have an easy way to catch this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org