You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Mathias Fußenegger (Jira)" <ji...@apache.org> on 2021/10/29 10:12:00 UTC

[jira] [Created] (FLINK-24702) toUpperCase/toLowerCase calls may cause problems with some system locales

Mathias Fußenegger created FLINK-24702:
------------------------------------------

             Summary: toUpperCase/toLowerCase calls may cause problems with some system locales
                 Key: FLINK-24702
                 URL: https://issues.apache.org/jira/browse/FLINK-24702
             Project: Flink
          Issue Type: Technical Debt
            Reporter: Mathias Fußenegger


I'm currently exploring the code base and saw several toUpperCase & toLowerCase calls on strings without explicitly declaring the Locale.

This means it will use the System Locale which can lead to surprising behaviors, see [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6208680]

 

> String.toLowerCase or String.toUpperCase sometimes fails to work when run in a Turkish or Azeri environment [...]The reason is that Turkish and Azeri have dotted and dotless "i"s, and conversion of these characters leads to results that aren't adequate for strings in other languages

 

I didn't investigate whether the current calls are actually problematic, but there could be bugs if there is a .equals() check following a toUpperCase/toLowerCase or when these strings are used in map lookups, etc.

 

Projects like Lucene use [https://github.com/policeman-tools/forbidden-apis] to prevent this methods from being used to avoid these potential problems.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)