You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/10/03 19:19:52 UTC

[PR] [WIP][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

MaxGekk opened a new pull request, #43203:
URL: https://github.com/apache/spark/pull/43203

   ### What changes were proposed in this pull request?
   In the PR, I propose to refer to the unescaping rules added by https://github.com/apache/spark/pull/43152 from expression descriptions like in `Like`, see
   <img width="1081" alt="Screenshot 2023-10-03 at 22 19 18" src="https://github.com/apache/spark/assets/1580697/088cd6ad-38e0-4ecd-845f-33009f4a6525">
   
   
   ### Why are the changes needed?
   To improve user experience w/ Spark SQL. 
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Manually generated docs and checked by eyes.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1347515797


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,11 +92,14 @@ abstract class StringRegexExpression extends BinaryExpression
           _ matches any one character in the input (similar to . in posix regular expressions)\
           % matches zero or more characters in the input (similar to .* in posix regular
           expressions)<br><br>
-          Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
-          to match "\abc", the pattern should be "\\abc".<br><br>
+          Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping
+          rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, in order to match "\abc", the pattern should be "\\abc".<br><br>
           When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back
           to Spark 1.6 behavior regarding string literal parsing. For example, if the config is
-          enabled, the pattern to match "\abc" should be "\abc".
+          enabled, the pattern to match "\abc" should be "\abc".<br><br>
+          The `pattern` argument might be a raw string literal (with the `r` prefix) to avoid

Review Comment:
   I think we should be more strong-opinioned here:
   ```
   It's recommended to use a raw string literal (with the `r` prefix) to
   avoid escaping special characters in the pattern string if exists.
   ```
   Then we add some examples to use raw string literal (should be simpler than turning off `escapedStringLiterals`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43203:
URL: https://github.com/apache/spark/pull/43203#issuecomment-1749508460

   Merging to master. Thank you, @cloud-fan for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1345401798


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,8 +92,9 @@ abstract class StringRegexExpression extends BinaryExpression
           _ matches any one character in the input (similar to . in posix regular expressions)\
           % matches zero or more characters in the input (similar to .* in posix regular
           expressions)<br><br>
-          Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
-          to match "\abc", the pattern should be "\\abc".<br><br>
+          Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping
+          rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, in order to match "\abc", the pattern should be "\\abc".<br><br>
           When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back

Review Comment:
   and shall we recommend people to use raw string literal as it's common to put sepcial chars in the pattern string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43203:
URL: https://github.com/apache/spark/pull/43203#issuecomment-1748524870

   @cloud-fan Could you review this PR, please. The failed test is not related to this changes (The same is failing in master commits).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1346374515


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,8 +92,9 @@ abstract class StringRegexExpression extends BinaryExpression
           _ matches any one character in the input (similar to . in posix regular expressions)\
           % matches zero or more characters in the input (similar to .* in posix regular
           expressions)<br><br>
-          Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
-          to match "\abc", the pattern should be "\\abc".<br><br>
+          Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping
+          rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, in order to match "\abc", the pattern should be "\\abc".<br><br>
           When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1347515797


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,11 +92,14 @@ abstract class StringRegexExpression extends BinaryExpression
           _ matches any one character in the input (similar to . in posix regular expressions)\
           % matches zero or more characters in the input (similar to .* in posix regular
           expressions)<br><br>
-          Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
-          to match "\abc", the pattern should be "\\abc".<br><br>
+          Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping
+          rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, in order to match "\abc", the pattern should be "\\abc".<br><br>
           When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back
           to Spark 1.6 behavior regarding string literal parsing. For example, if the config is
-          enabled, the pattern to match "\abc" should be "\abc".
+          enabled, the pattern to match "\abc" should be "\abc".<br><br>
+          The `pattern` argument might be a raw string literal (with the `r` prefix) to avoid

Review Comment:
   I think we should be more strong-opinioned here:
   ```
   It's recommended to use a raw string literal (with the `r` prefix) to avoid escaping special characters in the pattern string if exists.
   ```
   Then we add some examples to use raw string literal (should be the same behavior with turning off `escapedStringLiterals`)



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,11 +92,14 @@ abstract class StringRegexExpression extends BinaryExpression
           _ matches any one character in the input (similar to . in posix regular expressions)\
           % matches zero or more characters in the input (similar to .* in posix regular
           expressions)<br><br>
-          Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
-          to match "\abc", the pattern should be "\\abc".<br><br>
+          Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping
+          rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, in order to match "\abc", the pattern should be "\\abc".<br><br>
           When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back
           to Spark 1.6 behavior regarding string literal parsing. For example, if the config is
-          enabled, the pattern to match "\abc" should be "\abc".
+          enabled, the pattern to match "\abc" should be "\abc".<br><br>
+          The `pattern` argument might be a raw string literal (with the `r` prefix) to avoid

Review Comment:
   I think we should be more strong-opinioned here:
   ```
   It's recommended to use a raw string literal (with the `r` prefix) to
   avoid escaping special characters in the pattern string if exists.
   ```
   Then we add some examples to use raw string literal (should be the same behavior with turning off `escapedStringLiterals`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #43203: [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions
URL: https://github.com/apache/spark/pull/43203


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1345399757


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,8 +92,9 @@ abstract class StringRegexExpression extends BinaryExpression
           _ matches any one character in the input (similar to . in posix regular expressions)\
           % matches zero or more characters in the input (similar to .* in posix regular
           expressions)<br><br>
-          Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
-          to match "\abc", the pattern should be "\\abc".<br><br>
+          Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping
+          rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, in order to match "\abc", the pattern should be "\\abc".<br><br>
           When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back

Review Comment:
   If we have to mention this legacy config, shall we mention it in the string literal doc?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1347786295


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -421,12 +427,14 @@ case class NotLikeAny(child: Expression, patterns: Seq[UTF8String]) extends Like
       * regexp - a string expression. The regex string should be a Java regular expression.
 
           Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL
-          parser. For example, to match "\abc", a regular expression for `regexp` can be
-          "^\\abc$".
+          parser, see the unescaping rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, to match "\abc", a regular expression for `regexp` can be "^\\abc$".
 
           There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to
           fallback to the Spark 1.6 behavior regarding string literal parsing. For example,
-          if the config is enabled, the `regexp` that can match "\abc" is "^\abc$".
+          if the config is enabled, the `regexp` that can match "\abc" is "^\abc$".<br><br>
+          It's recommended to use a raw string literal (with the `r` prefix) to avoid escaping
+          special characters in the pattern string if exists.
   """,
   examples = """

Review Comment:
   I have the separate PR for examples https://github.com/apache/spark/pull/43037 which is pending to this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45400][SQL][DOCS] Refer to the unescaping rules from expression descriptions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1347719104


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -421,12 +427,14 @@ case class NotLikeAny(child: Expression, patterns: Seq[UTF8String]) extends Like
       * regexp - a string expression. The regex string should be a Java regular expression.
 
           Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL
-          parser. For example, to match "\abc", a regular expression for `regexp` can be
-          "^\\abc$".
+          parser, see the unescaping rules at <a href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String Literal</a>.
+          For example, to match "\abc", a regular expression for `regexp` can be "^\\abc$".
 
           There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to
           fallback to the Spark 1.6 behavior regarding string literal parsing. For example,
-          if the config is enabled, the `regexp` that can match "\abc" is "^\abc$".
+          if the config is enabled, the `regexp` that can match "\abc" is "^\abc$".<br><br>
+          It's recommended to use a raw string literal (with the `r` prefix) to avoid escaping
+          special characters in the pattern string if exists.
   """,
   examples = """

Review Comment:
   Can we update the examples to use raw string literal?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org