You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/26 20:41:32 UTC

[GitHub] [spark] peter-toth opened a new pull request #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

peter-toth opened a new pull request #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041
 
 
   ### What changes were proposed in this pull request?
   
   SPARK-21870 (https://github.com/apache/spark/commit/cb0cddffe9452937033e0e6b1fc0e600d2c787ad#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused a significant performance regression where the number of rows is relatively low but the generated code size is large. This PR replaces the slow regular expression based comment and extra new line removal to a much faster custom code.
   
   ### Why are the changes needed?
   To fix performance regression.
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   Existing UTs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399891562
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
   }
+
+  private object State extends Enumeration {
+    val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+    BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+    MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+  }
+
 
 Review comment:
   Could you leave some comments here about why we need to replace the existing regex with this the custom parser?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955447
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674781
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121150/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613559308
 
 
   If partial revert is acceptable then how about this other PR: https://github.com/apache/spark/pull/28083
   The main point there is the "revert" in `Block.length`. Other changes in `HashAggregateExec.scala` are just there to reduce the size of comments a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605741776
 
 
   Nice catch, @peter-toth ! I was able to reproduce this improvement in my env, too. cc: @cloud-fan @dongjoon-hyun  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687661
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032285
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25299/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406448
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400235880
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
   }
+
+  private object State extends Enumeration {
+    val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+    BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+    MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+  }
+
+  def stripExtraNewLinesAndComments(input: String): String = {
 
 Review comment:
   I've brushed it up a bit. I left `if`s instead of `match/case`s where we have only 2 or less branches. Let me know if it needs more brushing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410238
 
 
   **[Test build #120526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120526/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180055
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180055
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399909385
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
 
 Review comment:
   can we use a single regex to do it in one pass?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth closed pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth closed pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180064
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120595/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675211
 
 
   **[Test build #120439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120439/testReport)** for PR 28041 at commit [`a1a7275`](https://github.com/apache/spark/commit/a1a72752300997719a047efaf51e8795b72c7783).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032269
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607421808
 
 
   IMHO, `CodegenContext.registerComment` is created for a comment that includes dynamic text (e.g. query plan). `registerComment` creates an another comment.
   
   For example, [this line](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L372) should be generated by `registerComment`. But, [other lines](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L373) is still static. I think that the latter is not suitable for `registerComment`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606079585
 
 
   Can we fix them instead? I think we should use a standard way to add comments, which uses consistent code style so it's easy to match.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606960746
 
 
   Probably, all the comments in the generated codes should appear with `CodegenContext.registerComment`. If so, I think we don't need to strip the comments in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L926.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438589
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120526/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399665439
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
 ##########
 @@ -76,13 +76,49 @@ class CodeFormatterSuite extends SparkFunSuite {
 
     val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
     assert(reducedCode ===
-      """
-        |public function() {
+      """public function() {
 
 Review comment:
   the new implementation removes heading and trailing spaces too

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768131
 
 
   **[Test build #120439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120439/testReport)** for PR 28041 at commit [`a1a7275`](https://github.com/apache/spark/commit/a1a72752300997719a047efaf51e8795b72c7783).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606178792
 
 
   **[Test build #120595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120595/testReport)** for PR 28041 at commit [`2ca1354`](https://github.com/apache/spark/commit/2ca1354b980f280e408e6a27f5f947a6f8b7039b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605450976
 
 
   Ok, I kept old code and added a new benchmark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607643280
 
 
   Yup. Code comments are for the people who read the code. As long as the readers can see the comments, the comments don't need to be actually put into the generated code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410331
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25232/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773325
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120440/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451080
 
 
   **[Test build #120531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120531/testReport)** for PR 28041 at commit [`8875a74`](https://github.com/apache/spark/commit/8875a74f6186ca442ca4656e99da1f655e358bbb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675211
 
 
   **[Test build #120439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120439/testReport)** for PR 28041 at commit [`a1a7275`](https://github.com/apache/spark/commit/a1a72752300997719a047efaf51e8795b72c7783).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773321
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642680
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406427
 
 
   **[Test build #120524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120524/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451337
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25237/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675940
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25147/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607624670
 
 
   Got it. @maropu suggested me that `registerComment` translates a comment to empty string as default. I overlooked that `spark.sql.codegen.comments` is false as default.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612755145
 
 
   > Pretty big amount of the source code we currently generate is comment or extra whitespace/newline. 
   
   If so, how about defining a more light-weight logic to count the #characters in code blocks like this https://github.com/apache/spark/compare/master...maropu:SPARK-30564 ? I looked over the existing comments on generated code and it seems most comments have simple forms like this;
   ```
   // comments
   public void someMethod() {
     // comments
     some code; // comments
   }
   ```
   So, I think we don't need full-fledged functionality to remove generated comments. Also, `CodeFormatter` is for [debugging purpose](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala#L24) and I think `Block.length` should not depend on the class. When we need to add complicated comments (e.g., multi-line comments with `/* ... */`), I think they should come with `CodegenContext.registerComment`.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606031639
 
 
   **[Test build #120595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120595/testReport)** for PR 28041 at commit [`2ca1354`](https://github.com/apache/spark/commit/2ca1354b980f280e408e6a27f5f947a6f8b7039b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438585
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612844144
 
 
   Hmm, that lightweight solution looks slower than the full-fledged one in this PR and it doesn't help with the second issue: Method splitting can be different, depending on the value of `spark.sql.codegen.comments`.
   
   `CodeFormatter. stripExtraNewLinesAndComments` wasn't added for debug purposes in this commit initially: https://github.com/apache/spark/commit/76fb173dd639baa9534486488155fc05a71f850e. We can find a better place for it if necessary.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687666
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25148/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451337
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25237/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642681
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25837/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613541041
 
 
   @kiszk, yes I did a comparison. The change in `Block.length` causes the regression. If we just revert that part then we don't have regression (just probably less optimal decisions on method splitting as `Block.length` is only used for that).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768552
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120439/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604923815
 
 
   Results for the benchmark in the JIRA (https://issues.apache.org/jira/browse/SPARK-30564) before this PR:
   ```
   parsing large select:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   2500 select expressions                             314            806         612          0.0   313617893.0       0.0X
   ```
   After this PR:
   ```
   parsing large select:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   2500 select expressions                             120            131          11          0.0   120476036.0       0.0X
   ```
   
   https://github.com/apache/spark/pull/27078/files#diff-8d27bbf2f73a68bf0c2025f0702f7332R74 before this PR:
   ```
   deeply nested struct field r/w:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   250 deep x 400 rows (read in-mem)                  1239           1256          25          0.1       12387.6       0.0X
   ```
   After this PR:
   ```
   deeply nested struct field r/w:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   250 deep x 400 rows (read in-mem)                   233            250          13          0.4        2329.8       0.1X
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674503
 
 
   **[Test build #121150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121150/testReport)** for PR 28041 at commit [`f7fbe17`](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405457
 
 
   **[Test build #120524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120524/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604970720
 
 
   I couldn't clearly understand the relationship between the regression and this PR. Could you describe more about whats' a root cause and how-to-fix in the PR description? This issue is related to `spark.sql.codegen.methodSplitThreshold`? I know how to check the number of lines of generated code is pretty rough though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955453
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120482/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773325
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120440/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400232583
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
 ##########
 @@ -76,13 +76,49 @@ class CodeFormatterSuite extends SparkFunSuite {
 
     val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
     assert(reducedCode ===
-      """
-        |public function() {
+      """public function() {
         |code_body
         |code_body
         |code_body
-        |}
-      """.stripMargin)
+        |}""".stripMargin)
+  }
+
+  test("removing extra new lines and comments 2") {
 
 Review comment:
   ok, renamed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399890609
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
   }
+
+  private object State extends Enumeration {
+    val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+    BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+    MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+  }
+
+  def stripExtraNewLinesAndComments(input: String): String = {
 
 Review comment:
   Could you brush up code more (e.g., keep the same code style for consistency) by referring `IntervalUtils. stringToInterval`? Probably, I think @MaxGekk could help you ;)
   https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala#L623
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675924
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606038148
 
 
   AFAIK we put placeholders of comments when generating the code, and expand placeholders to comments when printing/logging the code. Why is there a perf problem of removing comments from the generated code?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605008449
 
 
   UT failure seems to be unrelated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405639
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606490888
 
 
   Ok, I will look into this and probably open a new PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498122
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613551958
 
 
   @peter-toth Thank you for your clarification. I think that it looks good to partially revert the part regarding `block.length`. As you said, while it is not optimal decision, it still would catch many cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451331
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904172
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606038148
 
 
   AFAIK we put placeholders of comments when generating the code, and expand placeholders to comments when printing/logging the code. Why is there a perf problem of replacing comments from the generated code?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773321
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604903736
 
 
   **[Test build #120482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120482/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642680
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405272
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405639
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399665467
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
 ##########
 @@ -61,28 +61,36 @@ class CodeFormatterSuite extends SparkFunSuite {
         |  * line
         |  * comments
         |  */
+        |  // comment
         |
         |public function() {
         |/*comment*/
         |  /*comment_with_space*/
-        |code_body
+        |code_body /* comment */ code_body/**/code_body // comment
         |//comment
-        |code_body
+        |code_body /*
+        |  * multi
+        |  * line
+        |  * comments
+        |  */
         |  //comment_with_space
         |
-        |code_body
+        |code_body // comment
+        | /*
+        |  * multi
+        |  * line
+        |  * comments
+        |  */
         |}
       """.stripMargin
 
     val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
     assert(reducedCode ===
-      """
-        |public function() {
+      """public function() {
+        |code_body code_body code_body
         |code_body
         |code_body
-        |code_body
-        |}
-      """.stripMargin)
+        |}""".stripMargin)
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613376751
 
 
   I would reject to maintain a hand-written parser without significant perf benefits. `stripExtraNewLinesAndComments` shouldn't be used in perf critical code path like `Block.length`, I think making it faster is not the right direction. We should not put comments in the generated code at all. It's hard to avoid new lines in the generated code, but it should be very fast to remove new lines, or at least the hand-written parser can be very simple to only remove new lines.
   
   For 3.0, how about we revert https://issues.apache.org/jira/browse/SPARK-21870 first to eliminate the regression? cc @rxin @dongjoon-hyun 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498126
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120531/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399888927
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
 ##########
 @@ -76,13 +76,49 @@ class CodeFormatterSuite extends SparkFunSuite {
 
     val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
     assert(reducedCode ===
-      """
-        |public function() {
+      """public function() {
         |code_body
         |code_body
         |code_body
-        |}
-      """.stripMargin)
+        |}""".stripMargin)
+  }
+
+  test("removing extra new lines and comments 2") {
 
 Review comment:
   How about just using the title here, `improves extra new line and comment removals`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675924
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405640
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25230/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406448
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400308973
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
 
 Review comment:
   How about calling `commentRegexp.replaceAllIn` only when we find `/` in `input`? Since the number of the lines with a comment is not large, I imagine that the regex spends the time for searching lines.
   It is necessary to process each line.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607624670
 
 
   Got it. @maropu suggested me that `registerComment` translates a comment to empty string as default.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410331
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25232/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606073660
 
 
   As far as I see there is this `CodegenContext.registerComment` to place such comments and placeholders. But not all comments are inserted into the source code that way: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L372-L376

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904177
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25187/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675940
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25147/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399909261
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
   }
+
+  private object State extends Enumeration {
+    val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+    BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+    MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+  }
+
+  def stripExtraNewLinesAndComments(input: String): String = {
 
 Review comment:
   We build a customer parser for intervals because the input is `UTF8String`. Is there any standard way to do the string replacement efficiently in the java world?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604970720
 
 
   I couldn't clearly understand the relationship between the regression and this PR. Could you describe more about whats' a root cause and how-to-fix in the PR description? This issue is related to `spark.sql.codegen.methodSplitThreshold`? I know how to check the number of lines of generated code is pretty rough now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612877040
 
 
   I think the performance difference comes from the overhead of monadic chains and [your commit](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee). But, not sure we need the further optimization (converting it into the while loop) in terms of code maintenability.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606031639
 
 
   **[Test build #120595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120595/testReport)** for PR 28041 at commit [`2ca1354`](https://github.com/apache/spark/commit/2ca1354b980f280e408e6a27f5f947a6f8b7039b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955453
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120482/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410238
 
 
   **[Test build #120526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120526/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768545
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642563
 
 
   **[Test build #121150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121150/testReport)** for PR 28041 at commit [`f7fbe17`](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498122
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604989121
 
 
   @maropu hope it makes sense now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400239074
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
 
 Review comment:
   `commentRegexp` is the source of slowness not the 2 passes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606644695
 
 
   I opened https://github.com/apache/spark/pull/28083 with your suggested changes that basically reverts `Block.length` and changes `HashAggregateExec` to place comments properly. It's much smaller and fixes the performance issue observed in `WideSchemaBenchmark`.
   
   But I'm not sure that the changes:
   - at `Block.length` to stip comments before returning length and
   - at `CodeFormatter.stripExtraNewLinesAndComments` in this PR to strip comments much faster
   
   are useless as these 2 functions are used at other places too: 
   - https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1053
   - https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L160
   - https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L926
   
   These usages look more general and as far as I see we have many comments scattered around in the generated code.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r401839081
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/CodeFormatterBenchmark.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
 
 Review comment:
   Shall we use the same package name of the target, i.e., `org.apache.spark.sql.catalyst.expressions.codegen`?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955311
 
 
   **[Test build #120482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120482/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768545
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613551958
 
 
   @peter-toth Thank you for your clarification. I agree the partial revert. I think that it looks good to partially revert the part regarding `block.length`. As you said, while it is not optimal decision, it still would catch many cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687666
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25148/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687661
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955447
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604903736
 
 
   **[Test build #120482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120482/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410328
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032285
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25299/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613541041
 
 
   @kiszk, yes I did a comparison. The change in `Block.length` causes the regression. If we just revert that part then there is no regression.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405640
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25230/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r401965969
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/CodeFormatterBenchmark.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
 
 Review comment:
   +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438585
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400232868
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
 
 Review comment:
   sure, moved

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400308973
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
 
 Review comment:
   How about calling `commentRegexp.replaceAllIn` only when we find `/` in `input`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-614512587
 
 
   Closing this as https://github.com/apache/spark/pull/28083 was merged.
   
   Thanks all for the review.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904177
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25187/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451080
 
 
   **[Test build #120531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120531/testReport)** for PR 28041 at commit [`8875a74`](https://github.com/apache/spark/commit/8875a74f6186ca442ca4656e99da1f655e358bbb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406451
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120524/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687031
 
 
   **[Test build #120440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120440/testReport)** for PR 28041 at commit [`f70422e`](https://github.com/apache/spark/commit/f70422e9d38ab8f51c68f024a8d6c5afa0f71d5d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399628268
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
 ##########
 @@ -61,28 +61,36 @@ class CodeFormatterSuite extends SparkFunSuite {
         |  * line
         |  * comments
         |  */
+        |  // comment
         |
         |public function() {
         |/*comment*/
         |  /*comment_with_space*/
-        |code_body
+        |code_body /* comment */ code_body/**/code_body // comment
         |//comment
-        |code_body
+        |code_body /*
+        |  * multi
+        |  * line
+        |  * comments
+        |  */
         |  //comment_with_space
         |
-        |code_body
+        |code_body // comment
+        | /*
+        |  * multi
+        |  * line
+        |  * comments
+        |  */
         |}
       """.stripMargin
 
     val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
     assert(reducedCode ===
-      """
-        |public function() {
+      """public function() {
+        |code_body code_body code_body
         |code_body
         |code_body
-        |code_body
-        |}
-      """.stripMargin)
+        |}""".stripMargin)
 
 Review comment:
   Can we add a new test unit instead of modifying the existing tests?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498126
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120531/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605497900
 
 
   **[Test build #120531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120531/testReport)** for PR 28041 at commit [`8875a74`](https://github.com/apache/spark/commit/8875a74f6186ca442ca4656e99da1f655e358bbb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180064
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120595/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r407222956
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/CodeFormatterBenchmark.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
 
 Review comment:
   Ok, moved.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768552
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120439/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687031
 
 
   **[Test build #120440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120440/testReport)** for PR 28041 at commit [`f70422e`](https://github.com/apache/spark/commit/f70422e9d38ab8f51c68f024a8d6c5afa0f71d5d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604970720
 
 
   I couldn't clearly understand the relationship between the regression and this PR. Could you describe more about whats' a root cause and how-to-fix in the PR? This issue is related to `spark.sql.codegen.methodSplitThreshold`? I know how to check the number of lines of generated code is pretty rough though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410054
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674781
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121150/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613532372
 
 
   Let me clarify one point. Does regression mean in the description the difference between w/21870 and w/o 21870?
   
   IMHO, [the performance experiment](https://github.com/apache/spark/pull/28041#issuecomment-604923815) shows the performance difference between w/this PR and w/o this PR. Did we have the performance comparisons in the benchmarks between w/21870 and w/o 21870?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605741776
 
 
   Nice catch, @peter-toth ! I was able to reproduce this improvement in my env. cc: @cloud-fan @dongjoon-hyun  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604924677
 
 
   cc @MaxGekk , @maropu

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612643258
 
 
   I looked into this issue a bit deeper. Let me share my thoughts:
   - I still don't think reverting the comment removal chage that @maropu added to `Block.length` would be a good idea for 2 reasons:
     - Pretty big amount of the source code we currently generate is comment or extra whitespace/newline. I admit that if we slowly update all comments to use `CodegenContext.registerComment` can help to reduce code size, but why don't we remove the comment removal logic only when we reach that desired state?
     - `CodegenContext.registerComment` has some issue here as it also adds some comment placeholders to the source code. If we don't remove those comment placeholders in `Block.length` then method splitting can be different when `spark.sql.codegen.comments` is enabled or disabled, which can be very confusing.
   - Since `Block.length` is still a costly operation in this PR, this commit:
   https://github.com/apache/spark/pull/28041/commits/f7fbe1777651cd93c1f8a33547854c5ba61963ee helped to further reduce run time:
   ```
   deeply nested struct field r/w:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   250 deep x 400 rows (read in-mem)                   137            164          27          0.7        1367.6       0.2X
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607029102
 
 
   Agree with @maropu . We can slowly update all comments with `CodegenContext.registerComment`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410328
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399889950
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
 
 Review comment:
   Could you move the existing regex  below into the benchmark code side? I think we don't need this in this class.
   ```
     val commentHolder = """\/\*(.+?)\*\/""".r
     val commentRegexp =
       ("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" + // strip /*comment*/
         """([ |\t]*?\/\/[\s\S]*?\n)""").r           // strip //comment
     val extraNewLinesRegexp = """\n\s*\n""".r       // strip extra newlines
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400308973
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
 
 Review comment:
   How about calling `commentRegexp.replaceAllIn` only when we find `/` in `input`? Since the number of the lines with a comment is not large, I imagine that the regex spends the time for searching lines.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438342
 
 
   **[Test build #120526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120526/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406451
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120524/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451331
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400236110
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
   }
+
+  private object State extends Enumeration {
+    val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+    BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+    MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+  }
+
 
 Review comment:
   ok, done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904172
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438589
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120526/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604772905
 
 
   **[Test build #120440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120440/testReport)** for PR 28041 at commit [`f70422e`](https://github.com/apache/spark/commit/f70422e9d38ab8f51c68f024a8d6c5afa0f71d5d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674778
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400239074
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ##########
 @@ -95,9 +95,128 @@ object CodeFormatter {
     new CodeAndComment(code.result().trim(), map)
   }
 
-  def stripExtraNewLinesAndComments(input: String): String = {
+  def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
     extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
 
 Review comment:
   `commentRegexp` is the source of slowness not the 2 pass.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642563
 
 
   **[Test build #121150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121150/testReport)** for PR 28041 at commit [`f7fbe17`](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674778
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405457
 
 
   **[Test build #120524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120524/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032269
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405197
 
 
   Could you add some benchmark code as compared to the existing regex, just like `HashBenchmark` and the result in the `/benchmark` dir?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642681
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25837/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org