You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/09/15 15:17:53 UTC

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dump query execution info to a...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22429

    [SPARK-25440][SQL] Dump query execution info to a file

    ## What changes were proposed in this pull request?
    
    In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the `explain()` method + generated code. One of the advantages of the method over `explain` is it doesn't truncate output and doesn't not materializes full output in memory which can cause OOMs.
    
    ## How was this patch tested?
    
    Added a test which checks that new method dumps correct info about a query.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 plan-to-file

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22429.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22429
    
----
commit 19b9a684b6d0985cf563257d6321fd6f14458d36
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T12:22:42Z

    Stub implementation and a test

commit 90832f9571e8cafde622069dec4f837141b07c30
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T12:57:24Z

    Saving all plans to file

commit 673ae565e42acc00df9acfba670c0491172ffb19
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T13:02:49Z

    Output attributes

commit fbde8120122c83eecbad2d060f959e467ecb4ff0
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T13:14:50Z

    Output whole stage codegen

commit dca19d33ed516bea8e3d113c5e81e3e1e2f2d77d
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T13:25:43Z

    Reusing codegenToOutputStream

commit 66351a09f60f0c9b21abe12bdb559a36761e8a8c
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T14:16:22Z

    Code de-duplication

commit 2ee75bcd4d495eb6031581ad7a38e757c254175b
Author: Maxim Gekk <ma...@...>
Date:   2018-09-15T14:41:26Z

    Do not truncate fields

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97253/testReport)** for PR 22429 at commit [`e4567cb`](https://github.com/apache/spark/commit/e4567cbc01d58a17a1b4ba8a618463b8872a69a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96097/testReport)** for PR 22429 at commit [`9b2a3e6`](https://github.com/apache/spark/commit/9b2a3e664564182e325ef08785c998c6ff9b5367).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @hvanhovell May I ask you to look at this one more time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217913739
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -469,7 +470,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
       def treeString: String = treeString(verbose = true)
     
       def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
    -    generateTreeString(0, Nil, new StringBuilder, verbose = verbose, addSuffix = addSuffix).toString
    +    val bos = new ByteArrayOutputStream()
    +    treeString(bos, verbose, addSuffix)
    +    bos.toString
    +  }
    +
    +  def treeString(os: OutputStream, verbose: Boolean, addSuffix: Boolean): Unit = {
    --- End diff --
    
    Can you please use a `java.io.Writer` or something else you can directly write a string to? You are now using `getBytes()` everywhere and that is far from cheap because it needs to encode the chars and allocate a byte array for each string.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r219977185
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -196,7 +196,7 @@ case class RDDScanExec(
         }
       }
     
    -  override def simpleString: String = {
    +  override def simpleString(maxFields: Option[Int]): String = {
         s"$nodeName${Utils.truncatedString(output, "[", ",", "]")}"
    --- End diff --
    
    `maxFields` should be used here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96192/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96192/testReport)** for PR 22429 at commit [`24dbbba`](https://github.com/apache/spark/commit/24dbbbadc100bcc1b66ad1f43afe65d86401d835).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96214/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @HyukjinKwon @cloud-fan Thank you for looking at the PR.
    
    So, if I split the PR to 2 PRs:
    1. Writing truncated plans to a file 
    2. Control number of fields in truncated strings.
    
    , it would be better for review, right?
    
    > I think it's okay to expose that number to toFile. I think this is an internal API, right?
    
    Yes, it is.
    
    > People could just set whatever number they want I guess. Fix me if I misread.
    
    I just had a concern that there are much more places (~10) besides of the `toFile` method where I have to pass some value for `maxFields`. In all such places I have to read a SQL config.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r221011834
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +170,56 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
    +   */
    +  val DEFAULT_MAX_TO_STRING_FIELDS = 25
    +
    +  private[spark] def maxNumToStringFields = {
    +    if (SparkEnv.get != null) {
    +      SparkEnv.get.conf.getInt("spark.debug.maxToStringFields", DEFAULT_MAX_TO_STRING_FIELDS)
    --- End diff --
    
    I have added new SQL config and I am testing it for now. My concern is how to combine old config, new config and passed parameter. I am going to take maximum of them like:
    ```scala
     private[spark] def maxNumToStringFields = {
        val legacyLimit = if (SparkEnv.get != null) {
          SparkEnv.get.conf.getInt("spark.debug.maxToStringFields", DEFAULT_MAX_TO_STRING_FIELDS)
        } else {
          DEFAULT_MAX_TO_STRING_FIELDS
        }
        val sqlConfLimit = SQLConf.get.maxToStringFields
    
        Math.max(sqlConfLimit, legacyLimit)
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96527/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96178/testReport)** for PR 22429 at commit [`3324927`](https://github.com/apache/spark/commit/33249274cd9b2fa8ff5e934fe572123e562bf105).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97253/testReport)** for PR 22429 at commit [`e4567cb`](https://github.com/apache/spark/commit/e4567cbc01d58a17a1b4ba8a618463b8872a69a2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97580/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96109/testReport)** for PR 22429 at commit [`ce2c086`](https://github.com/apache/spark/commit/ce2c08688bb8b51e97f686c95279a5f42b52116a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96109/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97256/testReport)** for PR 22429 at commit [`9b72104`](https://github.com/apache/spark/commit/9b721046c103f55d0abd4f65fd3e2e4b60be5ace).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218209230
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
    +        Utils.DEFAULT_MAX_TO_STRING_FIELDS)
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath)))
    +
    +      try {
    +        SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, Int.MaxValue.toString)
    +        writer.write("== Parsed Logical Plan ==\n")
    --- End diff --
    
    Combined


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @MaxGekk, couple of questions for its implementation from a cursory look. 
    
    It's the implementation is complicated here:
    
    1. it tries to use writer and avoid to construct the string. I think it's unlikely a single plan causes OOO. I think you can just iterate the plan and write out the trees.
    
    2. This looks introduces `None` concept to indicate no limit which makes this PR hard to read. I think it's okay to expose that number to `toFile`. I think this is an internal API, right? People could just set whatever number they want I guess.
    
    Fix me if I misread. If we get rid of those two points, I think the PR size will be drastically small.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    I am closing the PR since a part of it has been merged in #23018 already, and the rest is coming soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97302/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @gatorsmile @HyukjinKwon @viirya @rednaxelafx Are you ok with the proposed changes or there is something which blocks the PR for now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218211903
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
    +        Utils.DEFAULT_MAX_TO_STRING_FIELDS)
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath)))
    +
    +      try {
    +        SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, Int.MaxValue.toString)
    --- End diff --
    
    > It is generally a bad idea to change this conf as people expect that it is immutable.
    
    Maybe it is generally a bad idea but the purpose of the `toFile` is to call it during debug of an issue. Currently when someone need to output all fields, he/she has to set `spark.debug.maxToStringFields` before `explain()` and revert it back after `explain()`. If you want, I can add a comment which tells to an user that the `toFile` changes the config during its invoke.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by boy-uber <gi...@git.apache.org>.
Github user boy-uber commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    > > @boy-uber the thing you are suggesting is a pretty big undertaking and beyond the scope of this PR.
    > > If you are going to add structured plans to the explain output, you probably also want some guarantees about stability over multiple spark versions and you probably also want to be able to reconstruct the plan. Neither is easy. If you want to have this in Spark, then I suggest sending a proposal to the dev list.
    > 
    > Yeah, that is a larger change and may need more discussion. Your points about adding structured plans like that are great! Let me send a email to the dev list then! Thanks for the suggestion :)
    
    @MaxGekk I sent email to spark dev list about structured plan logging, but did not get any response. Thus circle back to discussion here. Do you have interest for work similar to that? If so, we could sync up more offline.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r224415554
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/CatalystDataToAvro.scala ---
    @@ -52,7 +52,7 @@ case class CatalystDataToAvro(child: Expression) extends UnaryExpression {
         out.toByteArray
       }
     
    -  override def simpleString: String = {
    +  override def simpleString(maxFields: Option[Int]): String = {
         s"to_avro(${child.sql}, ${child.dataType.simpleString})"
    --- End diff --
    
    passed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97600/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96527/testReport)** for PR 22429 at commit [`4ec5732`](https://github.com/apache/spark/commit/4ec57326612adb2d650e425d03eae70366480242).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217928334
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
    +        Utils.DEFAULT_MAX_TO_STRING_FIELDS)
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath)))
    +
    +      try {
    +        SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, Int.MaxValue.toString)
    --- End diff --
    
    It is generally a bad idea to change this conf as people expect that it is immutable. Also this change has some far reaching consequences, others will now also be exposed to a different `Utils.MAX_TO_STRING_FIELDS` value when calling `explain()`. Can you please just pass the parameter down the tree?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    >  Can we change `maxFields: Option[Int]` to `maxFields: Int` so that the caller will never forget to pass it in and it should either get it from some place or use the default value.
    
    @zsxwing Sure, but only replacing `Option[Int]` by `Int` cannot guarantee that. Need to remove default value to force a caller to pass a value to `simpleString()`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r232604420
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
    @@ -176,9 +176,9 @@ case class TakeOrderedAndProjectExec(
     
       override def outputPartitioning: Partitioning = SinglePartition
     
    -  override def simpleString: String = {
    -    val orderByString = Utils.truncatedString(sortOrder, "[", ",", "]")
    -    val outputString = Utils.truncatedString(output, "[", ",", "]")
    +  override def simpleString(maxFields: Option[Int]): String = {
    --- End diff --
    
    Can we just get rid of the `maxFields`? I think this makes the PR hard to read.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @MaxGekk Make sense. Could you also try to remove the default value?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98465/testReport)** for PR 22429 at commit [`f7de26d`](https://github.com/apache/spark/commit/f7de26d39df135d38b87660e5f68764891106508).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by boy-uber <gi...@git.apache.org>.
Github user boy-uber commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    > @boy-uber the thing you are suggesting is a pretty big undertaking and beyond the scope of this PR.
    > 
    > If you are going to add structured plans to the explain output, you probably also want some guarantees about stability over multiple spark versions and you probably also want to be able to reconstruct the plan. Neither is easy. If you want to have this in Spark, then I suggest sending a proposal to the dev list.
    
    Yeah, that is a larger change and may need more discussion. Your points about adding structured plans like that are great! Let me send a email to the dev list then! Thanks for the suggestion :)
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r223979392
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -633,4 +633,14 @@ package object config {
           .stringConf
           .toSequence
           .createWithDefault(Nil)
    +
    +  private[spark] val MAX_TO_STRING_FIELDS =
    +    ConfigBuilder("spark.debug.maxToStringFields")
    +      .internal()
    +      .doc("Maximum number of fields of sequence-like entries that can be converted to strings " +
    --- End diff --
    
    What is a sequence like entry?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    May I ask you @hvanhovell @zsxwing to review the PR one more time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217928631
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -469,7 +470,17 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
       def treeString: String = treeString(verbose = true)
     
       def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
    -    generateTreeString(0, Nil, new StringBuilder, verbose = verbose, addSuffix = addSuffix).toString
    +    val baos = new ByteArrayOutputStream()
    --- End diff --
    
    In this particular method, there is no benefits. This was changed to reused the method which accepts `OutputStream` instead of `StringBuilder`. Benefit of `OutputStream` over `StringBuilder` is no full materialization in memory and no string size limit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @gatorsmile @rednaxelafx @HyukjinKwon @viirya @hvanhovell Could you review the PR, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r219729210
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -668,11 +670,19 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
       override def generateTreeString(
           depth: Int,
           lastChildren: Seq[Boolean],
    -      builder: StringBuilder,
    +      writer: Writer,
           verbose: Boolean,
           prefix: String = "",
    -      addSuffix: Boolean = false): StringBuilder = {
    -    child.generateTreeString(depth, lastChildren, builder, verbose, s"*($codegenStageId) ")
    +      addSuffix: Boolean = false,
    +      maxFields: Option[Int]): Unit = {
    +    child.generateTreeString(
    +      depth,
    +      lastChildren,
    +      writer,
    +      verbose,
    +      s"*($codegenStageId) ",
    +      false,
    --- End diff --
    
    named boolean : `addSuffix = false`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96093/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97600/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r224389798
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -633,4 +633,14 @@ package object config {
           .stringConf
           .toSequence
           .createWithDefault(Nil)
    +
    +  private[spark] val MAX_TO_STRING_FIELDS =
    +    ConfigBuilder("spark.debug.maxToStringFields")
    +      .internal()
    +      .doc("Maximum number of fields of sequence-like entries that can be converted to strings " +
    --- End diff --
    
    I don't know how else I can describe all kind of classes where the parameter is applicable. If you have better words, you are welcome. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98461/testReport)** for PR 22429 at commit [`76f4248`](https://github.com/apache/spark/commit/76f424830418129c12a2a08d81f19377490c95eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96214/testReport)** for PR 22429 at commit [`24dbbba`](https://github.com/apache/spark/commit/24dbbbadc100bcc1b66ad1f43afe65d86401d835).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @hvanhovell @zsxwing Could you look at this during the next a few days otherwise I will be able to come back to the PR in 3 weeks, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r219276920
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala ---
    @@ -16,11 +16,33 @@
      */
     package org.apache.spark.sql.execution
     
    +import scala.io.Source
    +
     import org.apache.spark.sql.AnalysisException
     import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, OneRowRelation}
     import org.apache.spark.sql.test.SharedSQLContext
     
     class QueryExecutionSuite extends SharedSQLContext {
    +  test("dumping query execution info to a file") {
    +    withTempDir { dir =>
    +      val path = dir.getCanonicalPath + "/plans.txt"
    --- End diff --
    
    > Also add a few cases.
    
    @gatorsmile I added a few tests. Please, take a look at them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98475/testReport)** for PR 22429 at commit [`bda6ac2`](https://github.com/apache/spark/commit/bda6ac25882fdff5bf66b845cd6935154f1f744e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r223979931
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/CatalystDataToAvro.scala ---
    @@ -52,7 +52,7 @@ case class CatalystDataToAvro(child: Expression) extends UnaryExpression {
         out.toByteArray
       }
     
    -  override def simpleString: String = {
    +  override def simpleString(maxFields: Option[Int]): String = {
         s"to_avro(${child.sql}, ${child.dataType.simpleString})"
    --- End diff --
    
    Should we also pass the maxFields to `child.dataType`? For example `StructType` fields are truncated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r224415510
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -633,4 +633,14 @@ package object config {
           .stringConf
           .toSequence
           .createWithDefault(Nil)
    +
    +  private[spark] val MAX_TO_STRING_FIELDS =
    +    ConfigBuilder("spark.debug.maxToStringFields")
    +      .internal()
    +      .doc("Maximum number of fields of sequence-like entries that can be converted to strings " +
    --- End diff --
    
    I am going to change it to `Maximum number of fields of a tree node that can be ...` for the SQL config.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97580/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96551/testReport)** for PR 22429 at commit [`90ff7b5`](https://github.com/apache/spark/commit/90ff7b543967d8f98eaa35dd456ef43d3866097f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217915212
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala ---
    @@ -16,11 +16,33 @@
      */
     package org.apache.spark.sql.execution
     
    +import scala.io.Source
    +
     import org.apache.spark.sql.AnalysisException
     import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, OneRowRelation}
     import org.apache.spark.sql.test.SharedSQLContext
     
     class QueryExecutionSuite extends SharedSQLContext {
    +  test("dumping query execution info to a file") {
    +    withTempDir { dir =>
    +      val path = dir.getCanonicalPath + s"/plans.txt"
    --- End diff --
    
    We don't need string interpolation here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r220195279
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +170,56 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
    +   */
    +  val DEFAULT_MAX_TO_STRING_FIELDS = 25
    +
    +  private[spark] def maxNumToStringFields = {
    +    if (SparkEnv.get != null) {
    +      SparkEnv.get.conf.getInt("spark.debug.maxToStringFields", DEFAULT_MAX_TO_STRING_FIELDS)
    --- End diff --
    
    Since it is moved to sql, shall we name it as `spark.sql.debug.maxToStringFields`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by boy-uber <gi...@git.apache.org>.
Github user boy-uber commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    > @boy-uber, for structured streaming, let's do it out of this PR. I think the actual change of this PR can be small (1.). We can change this API for structured streaming later if needed since this is just an internal private API. Change (2.) can be shared I guess even if we go for structured streaming stuff.
    
    Hi @HyukjinKwon thanks for the comment! My original suggestion was not related to structured streaming. It was to support structured logging for Spark plan, for example, get the Spark plan as a json blob, and send it to Spark Listener. This will make it easy to Spark users to write a Spark Listener plugin to get the plan and process the json in their code. Then people could parse and analyze a large number of Spark applications based on their plans. One use case for this is using machine learning to predict resource usage based on the Spark plans. This may not be related to this PR. If you or other people are interested, we could talk offline.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96093/testReport)** for PR 22429 at commit [`2ee75bc`](https://github.com/apache/spark/commit/2ee75bcd4d495eb6031581ad7a38e757c254175b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @MaxGekk please just modify simpleString it is internal API for this reason.
    
    @rednaxelafx rope approach has the benefit that it does not create a ton of intermediate buffers. We could do that as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217915071
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +253,35 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sparkContext.hadoopConfiguration)
    --- End diff --
    
    Why use the hadoop configuration of the `SparkContext`? It is probably better to use the one that `sparkSession.sessionState.newHadoopConf()` provides.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by boy-uber <gi...@git.apache.org>.
Github user boy-uber commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Great to see the code here to capture more information from the plan and save to file. We have similar needs with our Spark applications as well.
    
    Adding my two cents:
    
    1. Anyone feel the need to log the spark plan as a json or other structured representation? Current text based logging for spark plan is good for human, but hard to be processed by code/tool.
    
    2. Could we generalize the solution a little bit? for example, writing the logical/physical plan json representation (or similar tree structure) to Spark listener, so people could write their own plugin to either write to file, or write to database, or write to kafka, etc.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r219729889
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +265,22 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    --- End diff --
    
    val fs = filePath.getFileSystem(spark.sessionState.newHadoopConf())


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @cloud-fan @gatorsmile May I ask you to look at the PR. It stuck for a while by unclear reasons but I believe the proposed method `toFile` could be pretty useful in troubleshooting different issues, and I just don't want to close it so easily.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96718/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97600/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218307450
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala ---
    @@ -16,11 +16,33 @@
      */
     package org.apache.spark.sql.execution
     
    +import scala.io.Source
    +
     import org.apache.spark.sql.AnalysisException
     import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, OneRowRelation}
     import org.apache.spark.sql.test.SharedSQLContext
     
     class QueryExecutionSuite extends SharedSQLContext {
    +  test("dumping query execution info to a file") {
    +    withTempDir { dir =>
    +      val path = dir.getCanonicalPath + "/plans.txt"
    --- End diff --
    
    Also add a few cases. For example, the path already exists? the path is not legal? and so on.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98475/testReport)** for PR 22429 at commit [`bda6ac2`](https://github.com/apache/spark/commit/bda6ac25882fdff5bf66b845cd6935154f1f744e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98465/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218197544
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -469,7 +470,17 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
       def treeString: String = treeString(verbose = true)
     
       def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
    -    generateTreeString(0, Nil, new StringBuilder, verbose = verbose, addSuffix = addSuffix).toString
    +    val baos = new ByteArrayOutputStream()
    --- End diff --
    
    This is done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r219729921
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +265,22 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new OutputStreamWriter(fs.create(filePath))
    --- End diff --
    
    cc @zsxwing Could you help review this function?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97302/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96154/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Here is a PR introduces `maxFields` parameter to all function involved in creation of truncated strings of spark plans: https://github.com/apache/spark/pull/23159


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98461/testReport)** for PR 22429 at commit [`76f4248`](https://github.com/apache/spark/commit/76f424830418129c12a2a08d81f19377490c95eb).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Is there anything which blocks the PR for now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97580/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96713/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96154/testReport)** for PR 22429 at commit [`71ff7d1`](https://github.com/apache/spark/commit/71ff7d1387fbe7d30299fe38471bce26fe73dad5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    I am able to address his comments for his vacation. Please keep reviewing this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r223980665
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -455,21 +457,37 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
       }.mkString(", ")
     
       /** ONE line description of this node. */
    -  def simpleString: String = s"$nodeName $argString".trim
    +  def simpleString(maxFields: Option[Int]): String = {
    --- End diff --
    
    Please document the `maxFields` parameter. I am especially interested in what `None` represents here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96713/testReport)** for PR 22429 at commit [`2bf11fc`](https://github.com/apache/spark/commit/2bf11fcb1c22d7118c2bcb24cc279494ea953482).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    I took a super quick pass - the change actually quite looks okay in general to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96175/testReport)** for PR 22429 at commit [`d3fede1`](https://github.com/apache/spark/commit/d3fede192123bd490b5a6a875559fa43d43bfc3e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96178/testReport)** for PR 22429 at commit [`3324927`](https://github.com/apache/spark/commit/33249274cd9b2fa8ff5e934fe572123e562bf105).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96192/testReport)** for PR 22429 at commit [`24dbbba`](https://github.com/apache/spark/commit/24dbbbadc100bcc1b66ad1f43afe65d86401d835).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96551/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @gatorsmile @zsxwing @hvanhovell @viirya @rednaxelafx @HyukjinKwon Please, take a look at the PR. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97253/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96175/testReport)** for PR 22429 at commit [`d3fede1`](https://github.com/apache/spark/commit/d3fede192123bd490b5a6a875559fa43d43bfc3e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96097/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Tests failed on `PythonForeachWriterSuite`. Waiting for https://github.com/apache/spark/pull/22452


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218303899
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -99,10 +99,11 @@ private[spark] object Utils extends Logging {
        * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
        */
       val DEFAULT_MAX_TO_STRING_FIELDS = 25
    +  val MAX_TO_STRING_FIELDS = "spark.debug.maxToStringFields"
    --- End diff --
    
    Can we move it to org.apache.spark.internal.config? I think our Core module should do the same thing like what we are doing in SQLConf. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @zsxwing Please, have a look at the PR one more time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r224391712
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +172,58 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv or by settings the SQL config
    +   * `spark.sql.debug.maxToStringFields`.
    +   */
    +  private[spark] def maxNumToStringFields: Int = {
    +    val legacyLimit = if (SparkEnv.get != null) {
    --- End diff --
    
    Taking into account that old config wasn't well documented and could use mostly in debugging, I think we can remove it in Spark 3.0. Initially the PR targeted to Spark 2.4, in the minor version removing a public config can break user apps potentially. If you are ok to remove it, I will do that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96718/testReport)** for PR 22429 at commit [`bd331c5`](https://github.com/apache/spark/commit/bd331c5f8b2e3a0642cc9f818f400af9cbcf37ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217928428
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
    +        Utils.DEFAULT_MAX_TO_STRING_FIELDS)
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath)))
    +
    +      try {
    +        SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, Int.MaxValue.toString)
    +        writer.write("== Parsed Logical Plan ==\n")
    --- End diff --
    
    Can we combine this entire block with what is done in the `toString()` method?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96527/testReport)** for PR 22429 at commit [`4ec5732`](https://github.com/apache/spark/commit/4ec57326612adb2d650e425d03eae70366480242).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r223983537
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +172,58 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv or by settings the SQL config
    +   * `spark.sql.debug.maxToStringFields`.
    +   */
    +  private[spark] def maxNumToStringFields: Int = {
    +    val legacyLimit = if (SparkEnv.get != null) {
    +      SparkEnv.get.conf.get(config.MAX_TO_STRING_FIELDS)
    +    } else {
    +      config.MAX_TO_STRING_FIELDS.defaultValue.get
    +    }
    +    val sqlConfLimit = SQLConf.get.maxToStringFields
    +
    +    Math.max(sqlConfLimit, legacyLimit)
    +  }
    +
    +  /** Whether we have warned about plan string truncation yet. */
    +  private val truncationWarningPrinted = new AtomicBoolean(false)
    +
    +  /**
    +   * Format a sequence with semantics similar to calling .mkString(). Any elements beyond
    +   * maxNumToStringFields will be dropped and replaced by a "... N more fields" placeholder.
    +   *
    +   * @return the trimmed and formatted string.
    +   */
    +  def truncatedString[T](
    +      seq: Seq[T],
    +      start: String,
    +      sep: String,
    +      end: String,
    +      maxFields: Option[Int]): String = {
    +    val maxNumFields = maxFields.getOrElse(maxNumToStringFields)
    --- End diff --
    
    You should document this behavior.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96718/testReport)** for PR 22429 at commit [`bd331c5`](https://github.com/apache/spark/commit/bd331c5f8b2e3a0642cc9f818f400af9cbcf37ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    As far as I know, @hvanhovell, you would prefer replacing `Option[Int]` by just `Int` in `simpleString` and other similar places. I can do that but I have to read the config parameter in all those places where I pass `None` at the moment. I have counted ~ 26 places. @zsxwing Do you also prefer this approach? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @rednaxelafx Thank you for such detailed comment.
    
    > It would be better to pass the max fields argument down the call chain instead of changing the Spark conf.
    
    Sure, it would be better. I just afraid I will break something by changing signature of methods like `simpleString`
    
    > The best-performing way to do this is probably to implement a custom Writer
    
    I am going to replace `StringWriter` by `StringBuilderWriter`. It should be faster though I am not sure it matters here.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218187690
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
    +        Utils.DEFAULT_MAX_TO_STRING_FIELDS)
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath)))
    +
    +      try {
    +        SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, Int.MaxValue.toString)
    --- End diff --
    
    Herman, I am trying to do that but I am already doubt it is right ways. I have to add new argument to `verboseString` and `simpleString` which is broadly used everywhere.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r223983702
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -189,23 +192,34 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
           """.stripMargin.trim
       }
     
    +  private def writeOrError(writer: Writer)(f: Writer => Unit): Unit =
    +    try f(writer) catch { case e: AnalysisException => writer.write(e.toString) }
    --- End diff --
    
    Please use multiple lines here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @boy-uber the thing you are suggesting is a pretty big undertaking and beyond the scope of this PR.
    
    If you are going to add structured plans to the explain output, you probably also want some guarantees about stability over multiple spark versions and you probably also want to be able to reconstruct the plan. Neither is easy. If you want to have this in Spark, then I suggest sending a proposal to the dev list.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96178/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r218235962
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -189,23 +192,32 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
           """.stripMargin.trim
       }
     
    +  private def writeOrError(writer: Writer)(f: Writer => Unit): Unit =
    +    try f(writer) catch { case e: AnalysisException => writer.write(e.toString) }
    +
    +  private def writePlans(writer: Writer): Unit = {
    +    writer.write("== Parsed Logical Plan ==\n")
    +    writeOrError(writer)(logical.treeString(_, verbose = true, addSuffix = false))
    +    writer.write("== Analyzed Logical Plan ==\n")
    +    writeOrError(writer) { writer =>
    +      analyzed.output.foreach(o => writer.write(s"${o.name}: ${o.dataType.simpleString}"))
    --- End diff --
    
    If you want to get the best out of this approach, it might be better to avoid string interpolation here and do the explicit `writer.write` calls for the things you're interpolating on.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r220409552
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +170,56 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
    +   */
    +  val DEFAULT_MAX_TO_STRING_FIELDS = 25
    +
    +  private[spark] def maxNumToStringFields = {
    +    if (SparkEnv.get != null) {
    +      SparkEnv.get.conf.getInt("spark.debug.maxToStringFields", DEFAULT_MAX_TO_STRING_FIELDS)
    --- End diff --
    
    Shall we have both deprecated old config and new sql config, then remove old config in next major version?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98475/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96154/testReport)** for PR 22429 at commit [`71ff7d1`](https://github.com/apache/spark/commit/71ff7d1387fbe7d30299fe38471bce26fe73dad5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96109/testReport)** for PR 22429 at commit [`ce2c086`](https://github.com/apache/spark/commit/ce2c08688bb8b51e97f686c95279a5f42b52116a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96093/testReport)** for PR 22429 at commit [`2ee75bc`](https://github.com/apache/spark/commit/2ee75bcd4d495eb6031581ad7a38e757c254175b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r217928262
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ---
    @@ -469,7 +470,17 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
       def treeString: String = treeString(verbose = true)
     
       def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
    -    generateTreeString(0, Nil, new StringBuilder, verbose = verbose, addSuffix = addSuffix).toString
    +    val baos = new ByteArrayOutputStream()
    --- End diff --
    
    What is the benefit of using this instead of using a `java.io.StringWriter` or `org.apache.commons.io.output.StringBuilderWriter`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96551/testReport)** for PR 22429 at commit [`90ff7b5`](https://github.com/apache/spark/commit/90ff7b543967d8f98eaa35dd456ef43d3866097f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by boy-uber <gi...@git.apache.org>.
Github user boy-uber commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    > > @MaxGekk I sent email to spark dev list about structured plan logging, but did not get any response.
    > 
    > @boy-uber I guess It is better to speak about the feature to @bogdanrdc @hvanhovell @larturus
    
    Thanks @MaxGekk for the contact list! I will ping them to gather more thoughts.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98324/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98324/testReport)** for PR 22429 at commit [`76f4248`](https://github.com/apache/spark/commit/76f424830418129c12a2a08d81f19377490c95eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r223982046
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +172,58 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv or by settings the SQL config
    +   * `spark.sql.debug.maxToStringFields`.
    +   */
    +  private[spark] def maxNumToStringFields: Int = {
    +    val legacyLimit = if (SparkEnv.get != null) {
    --- End diff --
    
    Just for context why do you want to retain the legacy behavior? It is probably fine to break it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96214/testReport)** for PR 22429 at commit [`24dbbba`](https://github.com/apache/spark/commit/24dbbbadc100bcc1b66ad1f43afe65d86401d835).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    > @MaxGekk I sent email to spark dev list about structured plan logging, but did not get any response. 
    
    @boy-uber I guess It is better to speak about the feature to @bogdanrdc @hvanhovell @larturus


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dump query execution info to a file

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96097/testReport)** for PR 22429 at commit [`9b2a3e6`](https://github.com/apache/spark/commit/9b2a3e664564182e325ef08785c998c6ff9b5367).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98465/testReport)** for PR 22429 at commit [`f7de26d`](https://github.com/apache/spark/commit/f7de26d39df135d38b87660e5f68764891106508).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Ooops i rished to read. Yea but still sounds related but orthogonal. Let's move it to mailing list. That should be the best place to discuss further.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97256/testReport)** for PR 22429 at commit [`9b72104`](https://github.com/apache/spark/commit/9b721046c103f55d0abd4f65fd3e2e4b60be5ace).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r219969937
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -250,5 +265,22 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         def codegenToSeq(): Seq[(String, String)] = {
           org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
         }
    +
    +    /**
    +     * Dumps debug information about query execution into the specified file.
    +     */
    +    def toFile(path: String): Unit = {
    +      val filePath = new Path(path)
    +      val fs = FileSystem.get(filePath.toUri, sparkSession.sessionState.newHadoopConf())
    +      val writer = new OutputStreamWriter(fs.create(filePath))
    --- End diff --
    
    `val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath), UTF_8))`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r220412301
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +170,56 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
    +   */
    +  val DEFAULT_MAX_TO_STRING_FIELDS = 25
    +
    +  private[spark] def maxNumToStringFields = {
    +    if (SparkEnv.get != null) {
    +      SparkEnv.get.conf.getInt("spark.debug.maxToStringFields", DEFAULT_MAX_TO_STRING_FIELDS)
    --- End diff --
    
    I think it makes sense to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @hvanhovell Could you look at the PR, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    This is hard to review, do you mean we should add `maxFields: Option[Int]` to all the string related methods?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98461/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @boy-uber, for structured streaming, let's do it out of this PR. I think the actual change of this PR can be small (1.). We can change this API for structured streaming later if needed since this is just an internal private API. Change (2.) can be shared I guess even if we go for structured streaming stuff.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96175/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r220200837
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +170,56 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
    +   */
    +  val DEFAULT_MAX_TO_STRING_FIELDS = 25
    +
    +  private[spark] def maxNumToStringFields = {
    +    if (SparkEnv.get != null) {
    +      SparkEnv.get.conf.getInt("spark.debug.maxToStringFields", DEFAULT_MAX_TO_STRING_FIELDS)
    --- End diff --
    
    Renaming it can potentially break user's apps. Can we do that in current minor version?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #96713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96713/testReport)** for PR 22429 at commit [`2bf11fc`](https://github.com/apache/spark/commit/2bf11fcb1c22d7118c2bcb24cc279494ea953482).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @rednaxelafx Please, take a look at the PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    > Could you also try to remove the default value?
    
    I have removed the default value `None` but still keep type as `Option[Int]`. @zsxwing Is it ok?
    
    > org.apache.spark.util.Utils.truncatedString seems in a wrong project. ... Could you move it to the catalyst project. 
    
    Moved to `catalyst`.
    
    > In addition, it would be great that you can change it to a SQL conf so that we can modify this config dynamically.
    
    Do you mean moving the `spark.debug.maxToStringFields` to `SQLConf` or adding another SQL config like `spark.sql.maxFieldsInTruncatedString`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #97302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97302/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    **[Test build #98324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98324/testReport)** for PR 22429 at commit [`76f4248`](https://github.com/apache/spark/commit/76f424830418129c12a2a08d81f19377490c95eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    @cloud-fan 
    
    > This is hard to review, do you mean we should add maxFields: Option[Int] to all the string related methods?
    
    Not to all but only to methods involved to producing textual representation of different plans. Those are `simpleString` and `verboseString` in children of `TreeNode` + `simpleString` of `StructType`. The changes are trivial - I just made them by fixing compilation errors. Maybe I missed somethings but only because it wasn't touched by compiler. 
    
    Main changes are concentrated in `QueryExecution.scala` where everything begins from the `toFile` method. Other changes are consequences of that. The main idea is to write a OutputStream instead of strings materializations in memory. So, it should allow to avoid limits for string size and OOM which we see sometimes in corner cases. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97256/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22429
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22429#discussion_r224418619
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -167,6 +172,58 @@ package object util {
         builder.toString()
       }
     
    +  /**
    +   * The performance overhead of creating and logging strings for wide schemas can be large. To
    +   * limit the impact, we bound the number of fields to include by default. This can be overridden
    +   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv or by settings the SQL config
    +   * `spark.sql.debug.maxToStringFields`.
    +   */
    +  private[spark] def maxNumToStringFields: Int = {
    +    val legacyLimit = if (SparkEnv.get != null) {
    --- End diff --
    
    I removed old core config and leaved only SQL config


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org