You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean R. Owen (Jira)" <ji...@apache.org> on 2022/06/16 15:04:00 UTC

[jira] [Updated] (SPARK-39107) Silent change in regexp_replace's handling of empty strings

     [ https://issues.apache.org/jira/browse/SPARK-39107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean R. Owen updated SPARK-39107:
---------------------------------
    Fix Version/s: 3.1.4
                       (was: 3.1.3)

> Silent change in regexp_replace's handling of empty strings
> -----------------------------------------------------------
>
>                 Key: SPARK-39107
>                 URL: https://issues.apache.org/jira/browse/SPARK-39107
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>            Reporter: Willi Raschkowski
>            Assignee: Lorenzo Martini
>            Priority: Major
>              Labels: correctness, release-notes
>             Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change that a) seems incorrect, and b) is undocumented in the [migration guide|https://spark.apache.org/docs/latest/sql-migration-guide.html]:
> {code:title=3.0.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> |   | <empty>|
> +---+--------+
> {code}
> {code:title=3.1.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> |   |        |
> +---+--------+
> {code}
> Note, the regular expression {{^$}} should match the empty string, but doesn't in version 3.1. E.g. this is the Java behavior:
> {code}
> scala> "".replaceAll("^$", "<empty>");
> res1: String = <empty>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org