You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Leandro Reis <lr...@adobe.com> on 2015/03/02 21:00:13 UTC

[io] support for additional character sets needed in ReversedLinesFileReader

Hi all,

I¹m working on a product that uses Commons IO via Jackrabbit Oak. In the
process of testing the launch of such product on Japanese Windows 2012
Server R2, I came across the following exception:
"(java.io.UnsupportedEncodingException: Encoding windows-31j is not
supported yet (feel free to submit a patch))"

windows-31j is the IANA name for Windows code page 932 (Japanese), and is
returned by Charset.defaultCharset(), used in
org.apache.commons.io.input.ReversedLinesFileReader [0].


It looks like this issue could be addressed by adding a check for
³windows-31j² to ReversedLinesFileReader(final File file, final int
blockSize, final Charset encoding):


...
} else if(charset.equals(Charset.forName("windows-31j"))) {
    byteDecrement = 1;
}
...

Similar changes would be needed in order to support the Chinese
Simplified, Chinese Traditional, and Korean versions of the same OS (I¹m
checking what the corresponding encoding names are).

Can someone familiar with this area of the code confirm this looks like
the proper approach to addressing this?

Thanks,
Leandro

[0] 
http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/apac
he/commons/io/input/ReversedLinesFileReader.java?view=markup


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [io] support for additional character sets needed in ReversedLinesFileReader

Posted by sebb <se...@gmail.com>.
On 2 March 2015 at 20:00, Leandro Reis <lr...@adobe.com> wrote:
> Hi all,
>
> I¹m working on a product that uses Commons IO via Jackrabbit Oak. In the
> process of testing the launch of such product on Japanese Windows 2012
> Server R2, I came across the following exception:
> "(java.io.UnsupportedEncodingException: Encoding windows-31j is not
> supported yet (feel free to submit a patch))"
>
> windows-31j is the IANA name for Windows code page 932 (Japanese), and is
> returned by Charset.defaultCharset(), used in
> org.apache.commons.io.input.ReversedLinesFileReader [0].
>
>
> It looks like this issue could be addressed by adding a check for
> ³windows-31j² to ReversedLinesFileReader(final File file, final int
> blockSize, final Charset encoding):
>
>
> ...
> } else if(charset.equals(Charset.forName("windows-31j"))) {
>     byteDecrement = 1;
> }
> ...
>
> Similar changes would be needed in order to support the Chinese
> Simplified, Chinese Traditional, and Korean versions of the same OS (I¹m
> checking what the corresponding encoding names are).
>
> Can someone familiar with this area of the code confirm this looks like
> the proper approach to addressing this?

Can a newline byte ever appear as part of a multi-byte character in
any of those encodings?

> Thanks,
> Leandro
>
> [0]
> http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/apac
> he/commons/io/input/ReversedLinesFileReader.java?view=markup
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org