You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Uwe Schindler <uw...@thetaphi.de> on 2014/08/07 14:39:52 UTC

RE: svn commit: r1616295 [1/2] - in /tika/trunk: ./ tika-app/src/main/java/org/apache/tika/cli/ tika-app/src/test/java/org/apache/tika/cli/ tika-core/src/main/java/org/apache/tika/detect/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java

> 
>       static class ExifHandler implements DirectoryHandler {
> -        private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new
> SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
> +        private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new
> SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss", Locale.getDefault());
> 
> That looks to be formatting to ISO-8859-1 format, so should probably be
> using a standard locale not the system default - ISO-8859-1 is the same
> everywhere!

This is exactly what I meant on the review board and in the issue: This should be Locale.ROOT, which means formatted language independent. If your computer in Thailand, then you have a big problem with the default locale - it does not even use ASCII digits anymore!

In general, the locales should be always defined:
- (Error-)Messages that are written in English, should be formatted with String.format(Locale.ENGLISH).
- Upper/Lowercasing for stuff like comparison or lookup in hashmaps should almost always be done with Locale.ROOT
- Charsets should always be given explicit, especially if we read resources from our own JAR file: Here we should prefer UTF-8
- If we read/write to console, this is the only place where you should use Charset.getDefault()

Unrelated, just also important: The SimpleDateFormat above should definitely not used by multiple threads, SimpleDateFormat is not threadsafe!!!

Uwe