You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/05/05 18:03:00 UTC
[jira] [Assigned] (SPARK-39107) Silent change in regexp_replace's handling of empty strings
[ https://issues.apache.org/jira/browse/SPARK-39107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39107:
------------------------------------
Assignee: (was: Apache Spark)
> Silent change in regexp_replace's handling of empty strings
> -----------------------------------------------------------
>
> Key: SPARK-39107
> URL: https://issues.apache.org/jira/browse/SPARK-39107
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.2
> Reporter: Willi Raschkowski
> Priority: Major
> Labels: correctness
>
> Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change that a) seems incorrect, and b) is undocumented in the [migration guide|https://spark.apache.org/docs/latest/sql-migration-guide.html]:
> {code:title=3.0.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> | | <empty>|
> +---+--------+
> {code}
> {code:title=3.1.2}
> scala> val df = spark.sql("SELECT '' AS col")
> df: org.apache.spark.sql.DataFrame = [col: string]
> scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show
> +---+--------+
> |col|replaced|
> +---+--------+
> | | |
> +---+--------+
> {code}
> Note, the regular expression {{^$}} should match the empty string, but doesn't in version 3.1. E.g. this is the Java behavior:
> {code}
> scala> "".replaceAll("^$", "<empty>");
> res1: String = <empty>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org