You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Emmanuel Bourg <eb...@apache.org> on 2012/03/07 17:12:37 UTC

Re: [io] Unicode escape/unescape Writer/Reader

I now have an implementation ready for the reader in the [csv] source code:

https://svn.apache.org/repos/asf/commons/sandbox/csv/trunk/src/main/java/org/apache/commons/csv/UnicodeUnescapeReader.java

I think I'll also handle other escape sequences such as \n or \t.

Emmanuel Bourg


Le 12/11/2011 00:27, Emmanuel Bourg a écrit :
> Hi,
>
> It seem that unescaping unicode escape sequences (\u1234) in input
> stream is a common need. [configuration] does it for
> PropertiesConfiguration, and [csv] can also decode these sequences
> optionally.
>
> In the other direction, there is also a need to escape unicode
> characters not supported by a given encoding when writing (see
> CONFIGURATION-457).
>
> I think these features could be implemented as a UnicodeUnescapeReader
> and a UnicodeEscapeWriter that might fit into [io].
>
> For the reader, any unicode escape sequence would be transformed into
> the corresponding unicode character, or ignored if the sequence is not
> valid.
>
> For the writer, a target charset would be specified in the constructor,
> and any character not supported by this charset would be turned into
> \uxxxx.
>
> What do you think?
>
> Emmanuel Bourg

Re: [io] Unicode escape/unescape Writer/Reader

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 07/03/2012 17:56, Honton, Charles a écrit :

> Isn't this performing the function of a java.nio.charset.CharsetDecoder or
> a org.apache.commons.codec.StringDecoder?

I don't think so. CharsetDecoder works at the binary level to transform 
bytes into characters according to a specific charset. And StringDecoder 
is just an interface.

Emmanuel Bourg

Re: [io] Unicode escape/unescape Writer/Reader

Posted by "Honton, Charles" <Ch...@intuit.com>.

Emmanuel,

Isn't this performing the function of a java.nio.charset.CharsetDecoder or
a org.apache.commons.codec.StringDecoder?

Regards,
Chas Honton


On 3/7/12 8:12 AM, "Emmanuel Bourg" <eb...@apache.org> wrote:

>I now have an implementation ready for the reader in the [csv] source
>code:
>
>https://svn.apache.org/repos/asf/commons/sandbox/csv/trunk/src/main/java/o
>rg/apache/commons/csv/UnicodeUnescapeReader.java
>
>I think I'll also handle other escape sequences such as \n or \t.
>
>Emmanuel Bourg
>
>
>Le 12/11/2011 00:27, Emmanuel Bourg a écrit :
>> Hi,
>>
>> It seem that unescaping unicode escape sequences (\u1234) in input
>> stream is a common need. [configuration] does it for
>> PropertiesConfiguration, and [csv] can also decode these sequences
>> optionally.
>>
>> In the other direction, there is also a need to escape unicode
>> characters not supported by a given encoding when writing (see
>> CONFIGURATION-457).
>>
>> I think these features could be implemented as a UnicodeUnescapeReader
>> and a UnicodeEscapeWriter that might fit into [io].
>>
>> For the reader, any unicode escape sequence would be transformed into
>> the corresponding unicode character, or ignored if the sequence is not
>> valid.
>>
>> For the writer, a target charset would be specified in the constructor,
>> and any character not supported by this charset would be turned into
>> \uxxxx.
>>
>> What do you think?
>>
>> Emmanuel Bourg
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org