You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/26 20:41:32 UTC
[GitHub] [spark] peter-toth opened a new pull request #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
peter-toth opened a new pull request #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041
### What changes were proposed in this pull request?
SPARK-21870 (https://github.com/apache/spark/commit/cb0cddffe9452937033e0e6b1fc0e600d2c787ad#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused a significant performance regression where the number of rows is relatively low but the generated code size is large. This PR replaces the slow regular expression based comment and extra new line removal to a much faster custom code.
### Why are the changes needed?
To fix performance regression.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing UTs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399891562
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
}
+
+ private object State extends Enumeration {
+ val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+ BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+ MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+ }
+
Review comment:
Could you leave some comments here about why we need to replace the existing regex with this the custom parser?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955447
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674781
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121150/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613559308
If partial revert is acceptable then how about this other PR: https://github.com/apache/spark/pull/28083
The main point there is the "revert" in `Block.length`. Other changes in `HashAggregateExec.scala` are just there to reduce the size of comments a bit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605741776
Nice catch, @peter-toth ! I was able to reproduce this improvement in my env, too. cc: @cloud-fan @dongjoon-hyun
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687661
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032285
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25299/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406448
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400235880
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
}
+
+ private object State extends Enumeration {
+ val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+ BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+ MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+ }
+
+ def stripExtraNewLinesAndComments(input: String): String = {
Review comment:
I've brushed it up a bit. I left `if`s instead of `match/case`s where we have only 2 or less branches. Let me know if it needs more brushing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410238
**[Test build #120526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120526/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180055
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180055
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399909385
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
Review comment:
can we use a single regex to do it in one pass?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth closed pull request #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth closed pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180064
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120595/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675211
**[Test build #120439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120439/testReport)** for PR 28041 at commit [`a1a7275`](https://github.com/apache/spark/commit/a1a72752300997719a047efaf51e8795b72c7783).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032269
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607421808
IMHO, `CodegenContext.registerComment` is created for a comment that includes dynamic text (e.g. query plan). `registerComment` creates an another comment.
For example, [this line](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L372) should be generated by `registerComment`. But, [other lines](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L373) is still static. I think that the latter is not suitable for `registerComment`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606079585
Can we fix them instead? I think we should use a standard way to add comments, which uses consistent code style so it's easy to match.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606960746
Probably, all the comments in the generated codes should appear with `CodegenContext.registerComment`. If so, I think we don't need to strip the comments in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L926.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438589
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120526/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399665439
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
##########
@@ -76,13 +76,49 @@ class CodeFormatterSuite extends SparkFunSuite {
val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
assert(reducedCode ===
- """
- |public function() {
+ """public function() {
Review comment:
the new implementation removes heading and trailing spaces too
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768131
**[Test build #120439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120439/testReport)** for PR 28041 at commit [`a1a7275`](https://github.com/apache/spark/commit/a1a72752300997719a047efaf51e8795b72c7783).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606178792
**[Test build #120595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120595/testReport)** for PR 28041 at commit [`2ca1354`](https://github.com/apache/spark/commit/2ca1354b980f280e408e6a27f5f947a6f8b7039b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605450976
Ok, I kept old code and added a new benchmark.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607643280
Yup. Code comments are for the people who read the code. As long as the readers can see the comments, the comments don't need to be actually put into the generated code.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410331
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25232/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773325
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120440/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451080
**[Test build #120531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120531/testReport)** for PR 28041 at commit [`8875a74`](https://github.com/apache/spark/commit/8875a74f6186ca442ca4656e99da1f655e358bbb).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675211
**[Test build #120439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120439/testReport)** for PR 28041 at commit [`a1a7275`](https://github.com/apache/spark/commit/a1a72752300997719a047efaf51e8795b72c7783).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773321
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642680
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406427
**[Test build #120524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120524/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451337
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25237/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675940
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25147/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk edited a comment on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607624670
Got it. @maropu suggested me that `registerComment` translates a comment to empty string as default. I overlooked that `spark.sql.codegen.comments` is false as default.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612755145
> Pretty big amount of the source code we currently generate is comment or extra whitespace/newline.
If so, how about defining a more light-weight logic to count the #characters in code blocks like this https://github.com/apache/spark/compare/master...maropu:SPARK-30564 ? I looked over the existing comments on generated code and it seems most comments have simple forms like this;
```
// comments
public void someMethod() {
// comments
some code; // comments
}
```
So, I think we don't need full-fledged functionality to remove generated comments. Also, `CodeFormatter` is for [debugging purpose](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala#L24) and I think `Block.length` should not depend on the class. When we need to add complicated comments (e.g., multi-line comments with `/* ... */`), I think they should come with `CodegenContext.registerComment`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606031639
**[Test build #120595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120595/testReport)** for PR 28041 at commit [`2ca1354`](https://github.com/apache/spark/commit/2ca1354b980f280e408e6a27f5f947a6f8b7039b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438585
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612844144
Hmm, that lightweight solution looks slower than the full-fledged one in this PR and it doesn't help with the second issue: Method splitting can be different, depending on the value of `spark.sql.codegen.comments`.
`CodeFormatter. stripExtraNewLinesAndComments` wasn't added for debug purposes in this commit initially: https://github.com/apache/spark/commit/76fb173dd639baa9534486488155fc05a71f850e. We can find a better place for it if necessary.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687666
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25148/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451337
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25237/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642681
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25837/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth edited a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613541041
@kiszk, yes I did a comparison. The change in `Block.length` causes the regression. If we just revert that part then we don't have regression (just probably less optimal decisions on method splitting as `Block.length` is only used for that).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768552
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120439/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604923815
Results for the benchmark in the JIRA (https://issues.apache.org/jira/browse/SPARK-30564) before this PR:
```
parsing large select: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
2500 select expressions 314 806 612 0.0 313617893.0 0.0X
```
After this PR:
```
parsing large select: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
2500 select expressions 120 131 11 0.0 120476036.0 0.0X
```
https://github.com/apache/spark/pull/27078/files#diff-8d27bbf2f73a68bf0c2025f0702f7332R74 before this PR:
```
deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
250 deep x 400 rows (read in-mem) 1239 1256 25 0.1 12387.6 0.0X
```
After this PR:
```
deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
250 deep x 400 rows (read in-mem) 233 250 13 0.4 2329.8 0.1X
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674503
**[Test build #121150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121150/testReport)** for PR 28041 at commit [`f7fbe17`](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405457
**[Test build #120524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120524/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604970720
I couldn't clearly understand the relationship between the regression and this PR. Could you describe more about whats' a root cause and how-to-fix in the PR description? This issue is related to `spark.sql.codegen.methodSplitThreshold`? I know how to check the number of lines of generated code is pretty rough though.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955453
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120482/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773325
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120440/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400232583
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
##########
@@ -76,13 +76,49 @@ class CodeFormatterSuite extends SparkFunSuite {
val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
assert(reducedCode ===
- """
- |public function() {
+ """public function() {
|code_body
|code_body
|code_body
- |}
- """.stripMargin)
+ |}""".stripMargin)
+ }
+
+ test("removing extra new lines and comments 2") {
Review comment:
ok, renamed
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399890609
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
}
+
+ private object State extends Enumeration {
+ val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+ BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+ MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+ }
+
+ def stripExtraNewLinesAndComments(input: String): String = {
Review comment:
Could you brush up code more (e.g., keep the same code style for consistency) by referring `IntervalUtils. stringToInterval`? Probably, I think @MaxGekk could help you ;)
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala#L623
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675924
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606038148
AFAIK we put placeholders of comments when generating the code, and expand placeholders to comments when printing/logging the code. Why is there a perf problem of removing comments from the generated code?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605008449
UT failure seems to be unrelated.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405639
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606490888
Ok, I will look into this and probably open a new PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498122
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613551958
@peter-toth Thank you for your clarification. I think that it looks good to partially revert the part regarding `block.length`. As you said, while it is not optimal decision, it still would catch many cases.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451331
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904172
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606038148
AFAIK we put placeholders of comments when generating the code, and expand placeholders to comments when printing/logging the code. Why is there a perf problem of replacing comments from the generated code?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604773321
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604903736
**[Test build #120482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120482/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642680
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405272
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405639
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399665467
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
##########
@@ -61,28 +61,36 @@ class CodeFormatterSuite extends SparkFunSuite {
| * line
| * comments
| */
+ | // comment
|
|public function() {
|/*comment*/
| /*comment_with_space*/
- |code_body
+ |code_body /* comment */ code_body/**/code_body // comment
|//comment
- |code_body
+ |code_body /*
+ | * multi
+ | * line
+ | * comments
+ | */
| //comment_with_space
|
- |code_body
+ |code_body // comment
+ | /*
+ | * multi
+ | * line
+ | * comments
+ | */
|}
""".stripMargin
val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
assert(reducedCode ===
- """
- |public function() {
+ """public function() {
+ |code_body code_body code_body
|code_body
|code_body
- |code_body
- |}
- """.stripMargin)
+ |}""".stripMargin)
Review comment:
ok
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613376751
I would reject to maintain a hand-written parser without significant perf benefits. `stripExtraNewLinesAndComments` shouldn't be used in perf critical code path like `Block.length`, I think making it faster is not the right direction. We should not put comments in the generated code at all. It's hard to avoid new lines in the generated code, but it should be very fast to remove new lines, or at least the hand-written parser can be very simple to only remove new lines.
For 3.0, how about we revert https://issues.apache.org/jira/browse/SPARK-21870 first to eliminate the regression? cc @rxin @dongjoon-hyun
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498126
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120531/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399888927
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
##########
@@ -76,13 +76,49 @@ class CodeFormatterSuite extends SparkFunSuite {
val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
assert(reducedCode ===
- """
- |public function() {
+ """public function() {
|code_body
|code_body
|code_body
- |}
- """.stripMargin)
+ |}""".stripMargin)
+ }
+
+ test("removing extra new lines and comments 2") {
Review comment:
How about just using the title here, `improves extra new line and comment removals`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675924
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25230/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406448
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400308973
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
Review comment:
How about calling `commentRegexp.replaceAllIn` only when we find `/` in `input`? Since the number of the lines with a comment is not large, I imagine that the regex spends the time for searching lines.
It is necessary to process each line.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607624670
Got it. @maropu suggested me that `registerComment` translates a comment to empty string as default.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410331
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25232/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606073660
As far as I see there is this `CodegenContext.registerComment` to place such comments and placeholders. But not all comments are inserted into the source code that way: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L372-L376
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904177
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25187/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604675940
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25147/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399909261
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
}
+
+ private object State extends Enumeration {
+ val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+ BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+ MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+ }
+
+ def stripExtraNewLinesAndComments(input: String): String = {
Review comment:
We build a customer parser for intervals because the input is `UTF8String`. Is there any standard way to do the string replacement efficiently in the java world?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604970720
I couldn't clearly understand the relationship between the regression and this PR. Could you describe more about whats' a root cause and how-to-fix in the PR description? This issue is related to `spark.sql.codegen.methodSplitThreshold`? I know how to check the number of lines of generated code is pretty rough now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612877040
I think the performance difference comes from the overhead of monadic chains and [your commit](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee). But, not sure we need the further optimization (converting it into the while loop) in terms of code maintenability.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606031639
**[Test build #120595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120595/testReport)** for PR 28041 at commit [`2ca1354`](https://github.com/apache/spark/commit/2ca1354b980f280e408e6a27f5f947a6f8b7039b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955453
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120482/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410238
**[Test build #120526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120526/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768545
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642563
**[Test build #121150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121150/testReport)** for PR 28041 at commit [`f7fbe17`](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498122
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604989121
@maropu hope it makes sense now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400239074
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
Review comment:
`commentRegexp` is the source of slowness not the 2 passes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606644695
I opened https://github.com/apache/spark/pull/28083 with your suggested changes that basically reverts `Block.length` and changes `HashAggregateExec` to place comments properly. It's much smaller and fixes the performance issue observed in `WideSchemaBenchmark`.
But I'm not sure that the changes:
- at `Block.length` to stip comments before returning length and
- at `CodeFormatter.stripExtraNewLinesAndComments` in this PR to strip comments much faster
are useless as these 2 functions are used at other places too:
- https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1053
- https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L160
- https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L926
These usages look more general and as far as I see we have many comments scattered around in the generated code.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r401839081
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/CodeFormatterBenchmark.scala
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
Review comment:
Shall we use the same package name of the target, i.e., `org.apache.spark.sql.catalyst.expressions.codegen`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955311
**[Test build #120482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120482/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768545
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk edited a comment on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk edited a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613551958
@peter-toth Thank you for your clarification. I agree the partial revert. I think that it looks good to partially revert the part regarding `block.length`. As you said, while it is not optimal decision, it still would catch many cases.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687666
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25148/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687661
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604955447
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604903736
**[Test build #120482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120482/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410328
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032285
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25299/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613541041
@kiszk, yes I did a comparison. The change in `Block.length` causes the regression. If we just revert that part then there is no regression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25230/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r401965969
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/CodeFormatterBenchmark.scala
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
Review comment:
+1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438585
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400232868
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
Review comment:
sure, moved
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400308973
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
Review comment:
How about calling `commentRegexp.replaceAllIn` only when we find `/` in `input`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-614512587
Closing this as https://github.com/apache/spark/pull/28083 was merged.
Thanks all for the review.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904177
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25187/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451080
**[Test build #120531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120531/testReport)** for PR 28041 at commit [`8875a74`](https://github.com/apache/spark/commit/8875a74f6186ca442ca4656e99da1f655e358bbb).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406451
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120524/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687031
**[Test build #120440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120440/testReport)** for PR 28041 at commit [`f70422e`](https://github.com/apache/spark/commit/f70422e9d38ab8f51c68f024a8d6c5afa0f71d5d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399628268
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
##########
@@ -61,28 +61,36 @@ class CodeFormatterSuite extends SparkFunSuite {
| * line
| * comments
| */
+ | // comment
|
|public function() {
|/*comment*/
| /*comment_with_space*/
- |code_body
+ |code_body /* comment */ code_body/**/code_body // comment
|//comment
- |code_body
+ |code_body /*
+ | * multi
+ | * line
+ | * comments
+ | */
| //comment_with_space
|
- |code_body
+ |code_body // comment
+ | /*
+ | * multi
+ | * line
+ | * comments
+ | */
|}
""".stripMargin
val reducedCode = CodeFormatter.stripExtraNewLinesAndComments(code)
assert(reducedCode ===
- """
- |public function() {
+ """public function() {
+ |code_body code_body code_body
|code_body
|code_body
- |code_body
- |}
- """.stripMargin)
+ |}""".stripMargin)
Review comment:
Can we add a new test unit instead of modifying the existing tests?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605498126
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120531/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605497900
**[Test build #120531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120531/testReport)** for PR 28041 at commit [`8875a74`](https://github.com/apache/spark/commit/8875a74f6186ca442ca4656e99da1f655e358bbb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606180064
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120595/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r407222956
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/CodeFormatterBenchmark.scala
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
Review comment:
Ok, moved.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604768552
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120439/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604687031
**[Test build #120440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120440/testReport)** for PR 28041 at commit [`f70422e`](https://github.com/apache/spark/commit/f70422e9d38ab8f51c68f024a8d6c5afa0f71d5d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604970720
I couldn't clearly understand the relationship between the regression and this PR. Could you describe more about whats' a root cause and how-to-fix in the PR? This issue is related to `spark.sql.codegen.methodSplitThreshold`? I know how to check the number of lines of generated code is pretty rough though.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410054
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674781
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121150/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-613532372
Let me clarify one point. Does regression mean in the description the difference between w/21870 and w/o 21870?
IMHO, [the performance experiment](https://github.com/apache/spark/pull/28041#issuecomment-604923815) shows the performance difference between w/this PR and w/o this PR. Did we have the performance comparisons in the benchmarks between w/21870 and w/o 21870?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605741776
Nice catch, @peter-toth ! I was able to reproduce this improvement in my env. cc: @cloud-fan @dongjoon-hyun
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604924677
cc @MaxGekk , @maropu
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612643258
I looked into this issue a bit deeper. Let me share my thoughts:
- I still don't think reverting the comment removal chage that @maropu added to `Block.length` would be a good idea for 2 reasons:
- Pretty big amount of the source code we currently generate is comment or extra whitespace/newline. I admit that if we slowly update all comments to use `CodegenContext.registerComment` can help to reduce code size, but why don't we remove the comment removal logic only when we reach that desired state?
- `CodegenContext.registerComment` has some issue here as it also adds some comment placeholders to the source code. If we don't remove those comment placeholders in `Block.length` then method splitting can be different when `spark.sql.codegen.comments` is enabled or disabled, which can be very confusing.
- Since `Block.length` is still a costly operation in this PR, this commit:
https://github.com/apache/spark/pull/28041/commits/f7fbe1777651cd93c1f8a33547854c5ba61963ee helped to further reduce run time:
```
deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
250 deep x 400 rows (read in-mem) 137 164 27 0.7 1367.6 0.2X
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-607029102
Agree with @maropu . We can slowly update all comments with `CodegenContext.registerComment`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605410328
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r399889950
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
Review comment:
Could you move the existing regex below into the benchmark code side? I think we don't need this in this class.
```
val commentHolder = """\/\*(.+?)\*\/""".r
val commentRegexp =
("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" + // strip /*comment*/
"""([ |\t]*?\/\/[\s\S]*?\n)""").r // strip //comment
val extraNewLinesRegexp = """\n\s*\n""".r // strip extra newlines
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400308973
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
Review comment:
How about calling `commentRegexp.replaceAllIn` only when we find `/` in `input`? Since the number of the lines with a comment is not large, I imagine that the regex spends the time for searching lines.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438342
**[Test build #120526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120526/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605406451
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120524/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605451331
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400236110
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
}
+
+ private object State extends Enumeration {
+ val TEXT, SEPARATOR, SEPARATOR_WITH_NEWLINE, MAYBE_COMMENT, LINE_COMMENT, BLOCK_COMMENT,
+ BLOCK_COMMENT_WITH_NEWLINE, MAYBE_BLOCK_COMMENT_CLOSING,
+ MAYBE_BLOCK_COMMENT_WITH_NEWLINE_CLOSING = Value
+ }
+
Review comment:
ok, done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041:
[WIP][SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604904172
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605438589
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120526/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [WIP][SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-604772905
**[Test build #120440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120440/testReport)** for PR 28041 at commit [`f70422e`](https://github.com/apache/spark/commit/f70422e9d38ab8f51c68f024a8d6c5afa0f71d5d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674778
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#discussion_r400239074
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
##########
@@ -95,9 +95,128 @@ object CodeFormatter {
new CodeAndComment(code.result().trim(), map)
}
- def stripExtraNewLinesAndComments(input: String): String = {
+ def stripExtraNewLinesAndCommentsUsingRegexp(input: String): String = {
extraNewLinesRegexp.replaceAllIn(commentRegexp.replaceAllIn(input, ""), "\n")
Review comment:
`commentRegexp` is the source of slowness not the 2 pass.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642563
**[Test build #121150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121150/testReport)** for PR 28041 at commit [`f7fbe17`](https://github.com/apache/spark/commit/f7fbe1777651cd93c1f8a33547854c5ba61963ee).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612674778
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405457
**[Test build #120524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120524/testReport)** for PR 28041 at commit [`12ce03c`](https://github.com/apache/spark/commit/12ce03c58463c70cc07f87fcecff8540630c0cf9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-606032269
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on issue #28041: [SPARK-30564][SQL]
Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
maropu commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-605405197
Could you add some benchmark code as compared to the existing regex, just like `HashBenchmark` and the result in the `/benchmark` dir?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28041:
[SPARK-30564][SQL] Improved extra new line and comment remove
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove
URL: https://github.com/apache/spark/pull/28041#issuecomment-612642681
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25837/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org