You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/05/05 18:03:00 UTC

[jira] [Assigned] (SPARK-39107) Silent change in regexp_replace's handling of empty strings

     [ https://issues.apache.org/jira/browse/SPARK-39107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-39107:
------------------------------------

    Assignee:     (was: Apache Spark)

> Silent change in regexp_replace's handling of empty strings
> -----------------------------------------------------------
>
>                 Key: SPARK-39107
>                 URL: https://issues.apache.org/jira/browse/SPARK-39107
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>            Reporter: Willi Raschkowski
>            Priority: Major
>              Labels: correctness
>
> Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change that a) seems incorrect, and b) is undocumented in the [migration guide|https://spark.apache.org/docs/latest/sql-migration-guide.html]:
> {code:title=3.0.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> |   | <empty>|
> +---+--------+
> {code}
> {code:title=3.1.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> |   |        |
> +---+--------+
> {code}
> Note, the regular expression {{^$}} should match the empty string, but doesn't in version 3.1. E.g. this is the Java behavior:
> {code}
> scala> "".replaceAll("^$", "<empty>");
> res1: String = <empty>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org