You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@metron.apache.org by "Michael Miklavcic (Jira)" <ji...@apache.org> on 2019/08/28 18:17:00 UTC

[jira] [Assigned] (METRON-614) Eliminate use of the default Charset

     [ https://issues.apache.org/jira/browse/METRON-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Miklavcic reassigned METRON-614:
----------------------------------------

    Assignee: Justin Leet  (was: Michael Miklavcic)

> Eliminate use of the default Charset
> ------------------------------------
>
>                 Key: METRON-614
>                 URL: https://issues.apache.org/jira/browse/METRON-614
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Justin Leet
>            Assignee: Justin Leet
>            Priority: Minor
>             Fix For: Next + 1
>
>          Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> During work on METRON-612, I noticed a lot of our warnings are from implicit use of the default Charset (e.g. when creating InputStreamReaders).  Given that this value is platform dependent, we should at least consider explicitly using a particular Charset.
> In addition, after fixing this, I would like to see it upgraded to a compile error, because in my experience almost everyone tends to use methods that use the implicit default Charset.  This can be done by changing the compiler arg in the pom to instead be
> {code}
> <compilerArgs>
>   <arg>-Xlint:unchecked</arg>
>   <arg>-Xep:DefaultCharset:ERROR</arg>
> </compilerArgs>
> {code}
> My assumption is that we'll want UTF-8, but given a lack of expertise and that usually there's a lot of opinions on character set issues, we may want to consider other options.
> http://errorprone.info/bugpattern/DefaultCharset
> https://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html lists standard Charsets and provides more details.
> {code}
> Charset	Description
> US-ASCII	Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
> ISO-8859-1  	ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
> UTF-8	Eight-bit UCS Transformation Format
> UTF-16BE	Sixteen-bit UCS Transformation Format, big-endian byte order
> UTF-16LE	Sixteen-bit UCS Transformation Format, little-endian byte order
> UTF-16	Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)