You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lorenzo Martini (Jira)" <ji...@apache.org> on 2023/01/03 15:22:00 UTC
[jira] [Updated] (SPARK-39107) Silent change in regexp_replace's handling of empty strings
[ https://issues.apache.org/jira/browse/SPARK-39107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lorenzo Martini updated SPARK-39107:
------------------------------------
Issue Type: Bug (was: Improvement)
> Silent change in regexp_replace's handling of empty strings
> -----------------------------------------------------------
>
> Key: SPARK-39107
> URL: https://issues.apache.org/jira/browse/SPARK-39107
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.2
> Reporter: Willi Raschkowski
> Assignee: Lorenzo Martini
> Priority: Major
> Labels: correctness, release-notes
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change that a) seems incorrect, and b) is undocumented in the [migration guide|https://spark.apache.org/docs/latest/sql-migration-guide.html]:
> {code:title=3.0.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> | | <empty>|
> +---+--------+
> {code}
> {code:title=3.1.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> | | |
> +---+--------+
> {code}
> Note, the regular expression {{^$}} should match the empty string, but doesn't in version 3.1. E.g. this is the Java behavior:
> {code}
> scala> "".replaceAll("^$", "<empty>");
> res1: String = <empty>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org