You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/05 18:00:50 UTC

[GitHub] [spark] LorenzoMartini opened a new pull request, #36457: [SPARK-39107] Account for empty string input in regex replace

LorenzoMartini opened a new pull request, #36457:
URL: https://github.com/apache/spark/pull/36457

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   When trying to perform a regex replace, account for the possibility of having empty strings as input.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   https://github.com/apache/spark/pull/29891 was merged to address https://issues.apache.org/jira/browse/SPARK-30796 and introduced a bug that would not allow regex matching on empty strings, as it would account for position within substring but not consider the case where input string has length 0 (empty string)
   
   From https://issues.apache.org/jira/browse/SPARK-39107 there is a change in behavior between spark versions.
   3.0.2
   ```
   scala> val df = spark.sql("SELECT '' AS col")
   df: org.apache.spark.sql.DataFrame = [col: string]
   
   scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
   +---+--------+
   |col|replaced|
   +---+--------+
   |   | <empty>|
   +---+--------+
   ```
   3.1.2
   ```
   scala> val df = spark.sql("SELECT '' AS col")
   df: org.apache.spark.sql.DataFrame = [col: string]
   
   scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
   +---+--------+
   |col|replaced|
   +---+--------+
   |   |        |
   +---+--------+
   ```
   
   The 3.0.2 outcome is the expected and correct one
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes compared to spark 3.2.1, as it brings back the correct behavior when trying to regex match empty strings.
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Added special casing test in `RegexpExpressionsSuite.RegexReplace` with empty string replacement.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867943111


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   @beliefer I changed the test to be grouped with the others in e2d9948a56495f9affe148661916178e6c5b22a5. Let me know if that's good :)!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867366601


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   There is a special case testing for the `nonNullExpr` so I followed that pattern instead of grouping with the other cases, since this is also a special case and wanted to keep it separate.
   I am happy to change it though, should I just add `row7` above with the other rows and the `checkEvaluation` bits together with the other `checkEvauation` bits then?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r868715276


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   @LorenzoMartini LGTM although I see it later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1121713316

   Question for, maybe, @dongjoon-hyun -- this is basically a bug fix for a behavior change in 3.1.0. I'd merge this to master and 3.3, but what about 3.1.x and 3.2.x? because fixing this does change behavior again. I'm personally a bit in favor of back-porting all the way, but just seeing if anyone has more thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866783088


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   Ah I see what you mean now! Yes you are right here, sorry my bad! Will change to that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867367428


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position == 0 || position < source.length) {

Review Comment:
   +1 to what @srowen says, the logic would be different then cause the evaluation of the position would change and trigger from one position earlier than it used to



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1121734508

   Merged to master/3.3/3.2/3.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866773711


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   IIC `position` is just `i-1` and that is just the position the user asks the regex replace to start from no?https://github.com/apache/spark/blob/08c07a17717691f931d4d3206dd0385073f5bd08/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L598 And `i` would be 1 by default if users don't specify a position (https://github.com/apache/spark/blob/08c07a17717691f931d4d3206dd0385073f5bd08/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L601-L602). So if user doesn't specify `pos`, then it would always be 1 and therefore `position` will always be 0.
   
   So ultimately the check `position==0` is not addressing the empty string case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867301567


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   These test cases should follows the others.
   ```
   val row7 = create_row("", "^$", "<empty string>")
   ...
   checkEvaluation(expr, "<empty string>", row7)
   ...
   checkEvaluation(exprWithExceedLength, "", row7)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1120131624

   > Seems OK; let me pause a bit to see if @beliefer wants to weigh in
   
   @srowen Thank you for your ping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867300892


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position == 0 || position < source.length) {

Review Comment:
   How about `position <= source.length` ?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -696,7 +696,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
       }
       String $source = $subject.toString();
       int $position = $pos - 1;
-      if ($position < $source.length()) {
+      if ($position == 0 || $position < $source.length()) {

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1119201477

   cc @beliefer too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866723000


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)

Review Comment:
   Maybe we actually want to special case this and throw if user gives us a regex replace with an empty string match but explicitly sets an expected position?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866722337


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   Changed to `isEmpty` in 0e8d78555f0fc54f6ae7331efa743bdb39c30819. Thanks for the look @srowen , let me know what you think :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866766828


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   Hm, how doesn't it catch the empty string? in that case, adding `position == 0 ||` makes the check true iff the string is empty (`position < source.length` is `false`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867424551


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position == 0 || position < source.length) {

Review Comment:
   > That changes the logic; then this check would always be true?
   
   For example, the source is "abc", the position need in range [0, 3].
   If the source is "",  the position must be 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866721488


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   Yes that's a good point. Compiler actually suggests `source.isEmpty()`. The default position passed in is 1 so position will be 0 by default so `position == 0 || position < source.length` doesn't catch the empty string.
   I will replace with `isEmpty`.
   
   I would also argue that any non-default position passed in with an empty string for the regex replace is an error on the user side, so I'm wondering if we should special case that?
   As in, if user gives us a regex replace with empty string match (`^$`) and explicitly sets a position, we should probably just throw? I don't really have a strong opinion here, happy to keep as is. Seems just a bit odd to specify a position when you are matching on an empty string that should always have length 0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866773711


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   IIC `position` is just i-1 and that is just the position the user asks the regex replace to start from no (https://github.com/apache/spark/blob/08c07a17717691f931d4d3206dd0385073f5bd08/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L598)? And `i` would be 1 by default if users don't specify a position (https://github.com/apache/spark/blob/08c07a17717691f931d4d3206dd0385073f5bd08/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L601-L602). So if user doesn't specify `pos`, then it would always be 1 and therefore `position` will always be 0.
   
   So ultimately the check `position==0` is not addressing the empty string case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1119707274

   Thanks @srowen! Sounds good to me, the constraint would just be a nit and I'm happy to keep it as is to avoid additional complications / excessive fixing. Can we merge this :)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace
URL: https://github.com/apache/spark/pull/36457


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1121319884

   All tests show that they pass. I'll wait a beat for last comments, but should be OK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866723000


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)

Review Comment:
   Maybe we actually want to special case this and throw if user gives us a regex replace with an empty string match but explicitly sets an expected position?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866464610


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   This is minor, but check `source.length == 0` instead? faster.
   In fact, isn't this equivalent to `position == 0 || position < source.length`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866781006


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   If position is 0, then the check is currently true in all cases except `source.length == 0` (empty string, right?). Your change makes it true in this case too when position is 0. I think both versions do that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r866789627


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position < source.length || (position == 0 && source.equals(""))) {

Review Comment:
   Addressed in 2280c24048c28f2a8b4c650af9c799face08dfb9



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867352167


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position == 0 || position < source.length) {

Review Comment:
   That changes the logic; then this check would always be true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867425434


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   > I am happy to change it though, should I just add `row7` above with the other rows and the `checkEvaluation` bits together with the other `checkEvauation` bits then?
   Looks good. Could you change them and let me see later ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867425198


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   OK. we can keep it separate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1121715555

   BTW, cc @MaxGekk for Apache Spark 3.3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867424551


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position == 0 || position < source.length) {

Review Comment:
   > That changes the logic; then this check would always be true?
   
   For example, the source is "abc", the position need in range [0, 3].
   If the source is "",  the position must be 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LorenzoMartini commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
LorenzoMartini commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1121260349

   Github UI shows failing but nothing is actually failing here. All comments should be addressed, can we get a merge on this please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867425434


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala:
##########
@@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
     val nonNullExpr = RegExpReplace(Literal("100-200"), Literal("(\\d+)"), Literal("num"))
     checkEvaluation(nonNullExpr, "num-num", row1)
 
+    // Test empty string replacement
+    val emptyString = RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"))
+    checkEvaluation(emptyString, "<empty string>", create_row("", "^$", "<empty string>"))
+    val emptyStringWithPositionOne =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 1)
+    checkEvaluation(emptyStringWithPositionOne, "<empty string>", create_row(""))
+    val emptyStringWithPositionGreater =
+      RegExpReplace(Literal(""), Literal("^$"), Literal("<empty string>"), 2)
+    checkEvaluation(emptyStringWithPositionGreater, "", create_row(""))

Review Comment:
   > I am happy to change it though, should I just add `row7` above with the other rows and the `checkEvaluation` bits together with the other `checkEvauation` bits then?
   
   Looks good. Could you change them and let me see later ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36457:
URL: https://github.com/apache/spark/pull/36457#discussion_r867425109


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
     }
     val source = s.toString()
     val position = i.asInstanceOf[Int] - 1
-    if (position < source.length) {
+    if (position == 0 || position < source.length) {

Review Comment:
   Yeah, I got it. The logic looks good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1121715011

   I'm also +1 for fixing this in all applicable branches, @srowen .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36457: [SPARK-39107] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1118888847

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #36457:
URL: https://github.com/apache/spark/pull/36457#issuecomment-1120069095

   Seems OK; let me pause a bit to see if @beliefer wants to weigh in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org