You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:04:36 UTC

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

     [ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-19248:
---------------------------------
    Labels: bulk-closed  (was: )

> Regex_replace works in 1.6 but not in 2.0
> -----------------------------------------
>
>                 Key: SPARK-19248
>                 URL: https://issues.apache.org/jira/browse/SPARK-19248
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>            Reporter: Lucas Tittmann
>            Priority: Major
>              Labels: bulk-closed
>
> We found an error in Spark 2.0.2 execution of Regex. Using PySpark In 1.6.2, we get the following, expected behaviour:
> {noformat}
> df = sqlContext.createDataFrame([('..   5.    ',)], ['col'])
> dfout = df.selectExpr(*["regexp_replace(col, '[ \.]*', '') AS col"]).collect()
> z.show(dfout)
> >>> [Row(col=u'5')]
> dfout2 = df.selectExpr(*["regexp_replace(col, '( |\.)*', '') AS col"]).collect()
> z.show(dfout2)
> >>> [Row(col=u'5')]
> {noformat}
> In Spark 2.0.2, with the same code, we get the following:
> {noformat}
> df = sqlContext.createDataFrame([('..   5.    ',)], ['col'])
> dfout = df.selectExpr(*["regexp_replace(col, '[ \.]*', '') AS col"]).collect()
> z.show(dfout)
> >>> [Row(col=u'5')]
> dfout2 = df.selectExpr(*["regexp_replace(col, '( |\.)*', '') AS col"]).collect()
> z.show(dfout2)
> >>> [Row(col=u'')]
> {noformat}
> As you can see, the second regex shows different behaviour depending on the Spark version. We checked the regex in Java, and both should be correct and work. Therefore, regex execution in 2.0.2 seems to be erroneous. I do not have the possibility to confirm in 2.1 at the moment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org