You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Ed Korthof <ed...@apache.org> on 2001/12/14 07:53:01 UTC

[PATCH] org.apache.commons.util.StringUtils charset handling

Hi --

The attached patch includes a couple of functions which may be useful
for people dealing with i18n issues.  Daniel R. indicated that this
might be the most reasonable place for it.  (Functionality more or less
like this is going to be necessary in fulcrum, to handle multi-byte form
input -- and I'm now using one of these functions for an app based
around javax.mail.)  The patch is from jakarta-commons-sandbox/util/.

Questions, comments, and criticism are all welcome.  I think I may have
commit privs for jakarta (from a while back), but I'd rather have some
folks take a look at my patches, at least at first.

I've attached test code as well, to demonstrate this.

thanks --

Ed


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: [PATCH] org.apache.commons.util.StringUtils charset handling

Posted by Ed Korthof <ed...@apache.org>.

As it works out, I've since found that there's a much simpler idiom for
doing encoding transformations -- String class is the third location
(there are only three) where this can be doing, and using that will
greatly simplify code which needs to do string transforms.

The proper way to do them is something like (assuming that your JVM's
native encoding is ISO-8859-1, which makes certain things much easier):

String nativeSource = "^[$B$3$l$OF|K\\8l$N%F%9%H$G$9!#^[(B";
String encoding = "iso-2022-jp";
String unicode = new String(nativeSource, encoding);
String outputNativeEncoding = unicode.getBytes("EUC_JP");

and the like (this is specific to two Japanese encodings, obviously).

The error you saw shouldn't happen -- perhaps the data got munged in
some way during transport.  I don't know what to say about it (I'd be
happy to look if you're curious), but I'd say that it'd probably be
better to remove these methods, since they're not the correct idiom for
string transformations.  Sorry for sending in that patch.

thanks --

Ed

On Fri, Dec 28, 2001 at 05:01:01PM -0800, Daniel Rall wrote:
> Hi Ed.  I committed the change to StringUtils with some minor mods.  I
> couldn't get the test case to run and unsure of what the output should
> look like, but I assume this is a problem with my local environment.
> 
> dlr@despot:util$ java -classpath /tmp:commons-util-0.1-dev.jar test
> Exception in thread "main" sun.io.MalformedInputException
> 	at sun.io.ByteToCharISO2022JP.flush(ByteToCharISO2022JP.java:40)
> 	at java.io.InputStreamReader.flushInto(InputStreamReader.java:154)
> 	at java.io.InputStreamReader.fill(InputStreamReader.java:178)
> 	at java.io.InputStreamReader.read(InputStreamReader.java:249)
> 	at org.apache.commons.util.StringUtils.convertNativeToUnicode(StringUtils.java)
> 	at test.main(test.java:8)
> 
> I checked the test code into the appropriate JUnit test, commented
> out.  If you could hook that up as an assertion, it would be helpful
> (there is an example right above in the Java source file).
> 
> Ed Korthof <ed...@apache.org> writes:
> 
> > On Thu, Dec 13, 2001 at 10:53:01PM -0800, Ed Korthof wrote:
> >
> > import org.apache.commons.util.StringUtils;
> >
> > public class test
> > {
> >     public static void main(String argv[]) throws Exception
> >     {
> >         String input = "$B$3$l$OF|K\\8l$N%F%9%H$G$9!#(B";
> >         String unicode = StringUtils.convertInputString(input, "iso-2022-jp");
> >         String iso = StringUtils.convertOutputString(unicode, "iso-2022-jp");
> >         System.err.println(input);
> >         System.err.println(unicode);
> >         System.err.println(iso);
> >     }
> > }
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: [PATCH] org.apache.commons.util.StringUtils charset handling

Posted by Daniel Rall <dl...@finemaltcoding.com>.

Hi Ed.  I committed the change to StringUtils with some minor mods.  I
couldn't get the test case to run and unsure of what the output should
look like, but I assume this is a problem with my local environment.

dlr@despot:util$ java -classpath /tmp:commons-util-0.1-dev.jar test
Exception in thread "main" sun.io.MalformedInputException
	at sun.io.ByteToCharISO2022JP.flush(ByteToCharISO2022JP.java:40)
	at java.io.InputStreamReader.flushInto(InputStreamReader.java:154)
	at java.io.InputStreamReader.fill(InputStreamReader.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:249)
	at org.apache.commons.util.StringUtils.convertNativeToUnicode(StringUtils.java)
	at test.main(test.java:8)

I checked the test code into the appropriate JUnit test, commented
out.  If you could hook that up as an assertion, it would be helpful
(there is an example right above in the Java source file).

Ed Korthof <ed...@apache.org> writes:

> On Thu, Dec 13, 2001 at 10:53:01PM -0800, Ed Korthof wrote:
>
> import org.apache.commons.util.StringUtils;
>
> public class test
> {
>     public static void main(String argv[]) throws Exception
>     {
>         String input = "$B$3$l$OF|K\\8l$N%F%9%H$G$9!#(B";
>         String unicode = StringUtils.convertInputString(input, "iso-2022-jp");
>         String iso = StringUtils.convertOutputString(unicode, "iso-2022-jp");
>         System.err.println(input);
>         System.err.println(unicode);
>         System.err.println(iso);
>     }
> }

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: [PATCH] org.apache.commons.util.StringUtils charset handling

Posted by Ed Korthof <ed...@apache.org>.

On Thu, Dec 13, 2001 at 10:53:01PM -0800, Ed Korthof wrote:
> I've attached test code as well, to demonstrate this.

<sigh> attachments included, this time.

cheers --

Ed