You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nihar Sheth (JIRA)" <ji...@apache.org> on 2018/08/24 21:52:00 UTC
[jira] [Commented] (SPARK-25230) Upper behavior incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592216#comment-16592216 ]
Nihar Sheth commented on SPARK-25230:
-------------------------------------
This seems to be a JVM thing [https://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toUpperCase%28java.util.Locale%29] All locales will switch it to SS in Java/Scala
From what I've quickly checked, mysql, postgresql, and sqlite all do not change the character, but spark-sql and websql change to SS. If it's essential to fix, it might just come down to replacing it with a placeholder value, performing the uppercasing, then substituting it back in.
> Upper behavior incorrect for string contains "ß"
> ------------------------------------------------
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.1
> Reporter: Yuming Wang
> Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
> !MySQL.png!
>
> This behavior may lead to data inconsistency:
> {code:sql}
> create temporary view SPARK_25230 as select * from values
> ("Hassler"),
> ("Haßler")
> as EMPLOYEE(name);
> select UPPER(name) from SPARK_25230 group by 1;
> -- result
> HASSLER{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org