You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Ed Korthof <ed...@apache.org> on 2002/02/06 04:07:42 UTC

Re: [PATCH] org.apache.commons.util.StringUtils charset handling

As it works out, I've since found that there's a much simpler idiom for
doing encoding transformations -- String class is the third location
(there are only three) where this can be doing, and using that will
greatly simplify code which needs to do string transforms.

The proper way to do them is something like (assuming that your JVM's
native encoding is ISO-8859-1, which makes certain things much easier):

String nativeSource = "^[$B$3$l$OF|K\\8l$N%F%9%H$G$9!#^[(B";
String encoding = "iso-2022-jp";
String unicode = new String(nativeSource, encoding);
String outputNativeEncoding = unicode.getBytes("EUC_JP");

and the like (this is specific to two Japanese encodings, obviously).

The error you saw shouldn't happen -- perhaps the data got munged in
some way during transport.  I don't know what to say about it (I'd be
happy to look if you're curious), but I'd say that it'd probably be
better to remove these methods, since they're not the correct idiom for
string transformations.  Sorry for sending in that patch.

thanks --

Ed

On Fri, Dec 28, 2001 at 05:01:01PM -0800, Daniel Rall wrote:
> Hi Ed.  I committed the change to StringUtils with some minor mods.  I
> couldn't get the test case to run and unsure of what the output should
> look like, but I assume this is a problem with my local environment.
> 
> dlr@despot:util$ java -classpath /tmp:commons-util-0.1-dev.jar test
> Exception in thread "main" sun.io.MalformedInputException
> 	at sun.io.ByteToCharISO2022JP.flush(ByteToCharISO2022JP.java:40)
> 	at java.io.InputStreamReader.flushInto(InputStreamReader.java:154)
> 	at java.io.InputStreamReader.fill(InputStreamReader.java:178)
> 	at java.io.InputStreamReader.read(InputStreamReader.java:249)
> 	at org.apache.commons.util.StringUtils.convertNativeToUnicode(StringUtils.java)
> 	at test.main(test.java:8)
> 
> I checked the test code into the appropriate JUnit test, commented
> out.  If you could hook that up as an assertion, it would be helpful
> (there is an example right above in the Java source file).
> 
> Ed Korthof <ed...@apache.org> writes:
> 
> > On Thu, Dec 13, 2001 at 10:53:01PM -0800, Ed Korthof wrote:
> >
> > import org.apache.commons.util.StringUtils;
> >
> > public class test
> > {
> >     public static void main(String argv[]) throws Exception
> >     {
> >         String input = "$B$3$l$OF|K\\8l$N%F%9%H$G$9!#(B";
> >         String unicode = StringUtils.convertInputString(input, "iso-2022-jp");
> >         String iso = StringUtils.convertOutputString(unicode, "iso-2022-jp");
> >         System.err.println(input);
> >         System.err.println(unicode);
> >         System.err.println(iso);
> >     }
> > }
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>