You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by "Andrew C. Oliver" <ac...@apache.org> on 2003/09/03 22:18:14 UTC

Re: unicode support

We've done detection before...its always a huge performance and memory hog.
In any case, I don't think this method has been tried.  So go for it and see
what happens.

-Andy

On 9/3/03 5:18 PM, "A. Rothman" <am...@amichais.net> wrote:

> Hey guys,
> 
> After experiencing some unicode trouble with HSSF today, namely setting the
> cell encoding before setting it's text (or lack thereof...), I figured since
> Java is unicode based it's not too friendly to require users to set unicode
> flags explicitly in order for unicode to work. I traced the problem down to
> UnicodeString.serialize(), and found some rather strange code that seems to do
> nothing (decomposing and creating a string, try and catch blocks that are
> identical...anyone have any ideas?), and thought we should have the serializer
> (or perhaps the constructor?) detect the case where the String contains
> non-ascii/iso-latin chars and set the encoding automatically. This can be as
> simple as a
> 
> if (str.equals(new String(str.getBytes("iso8859_1"),"iso8859_1"))) // string
> can be compressed
> ...
> 
> what do u say? any implications I didn't think of?
> 
> 
> -Amichai
> 

-- 
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI

http://jakarta.apache.org/poi
For Java and Excel, Got POI?

The views expressed in this email are those of the author and are almost
definitely not shared by the Apache Software Foundation, its board or its
general membership.  In fact they probably most definitively disagree with
everything espoused in the above email.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org


Re: unicode support

Posted by "A. Rothman" <am...@amichais.net>.
1. how do I post the changes (and new TestUnicodeString class)?

2. the SSTRecord addString( final String string, final boolean useUTF16 )
method contract will change a bit - the flag will force UTF16, but will only
allow 8-bit representation (compressed) if the string can be encoded as
8-bit chars, otherwise it'll be 16-bit as well. I hope this doesn't have any
implications (except for progress :-) ).

----- Original Message ----- 
From: "Andrew C. Oliver" <ac...@apache.org>
To: "POI Developers List" <po...@jakarta.apache.org>
Sent: Wednesday, September 03, 2003 10:36 PM
Subject: Re: unicode support


> If you ruin it...the unit tests should fail.  :-)
>
> On 9/3/03 5:27 PM, "A. Rothman" <am...@amichais.net> wrote:
>
> > I had performance in mind as well, but then I saw there
> >
> > String unicodeString = new
> > String(getString().getBytes("Unicode"),"Unicode");
> >
> > which is exactly the same performancewise, only this doesn't do anything
> > (since unicode conversion doesn't lose nor gain any data in the
conversion -
> > it just decomposes and recomposes the string). also "Unicode" appears
> > neither in the JVM required encodings list or sun supported
encodings...it
> > may be very JVM dependent.
> >
> > I'd still like to hear if anyone knows what that code section does
before I
> > ruin anything :-)
> >
> > ----- Original Message -----
> > From: "Andrew C. Oliver" <ac...@apache.org>
> > To: "POI Developers List" <po...@jakarta.apache.org>
> > Sent: Wednesday, September 03, 2003 10:18 PM
> > Subject: Re: unicode support
> >
> >
> >> We've done detection before...its always a huge performance and memory
> > hog.
> >> In any case, I don't think this method has been tried.  So go for it
and
> > see
> >> what happens.
> >>
> >> -Andy
> >>
> >> On 9/3/03 5:18 PM, "A. Rothman" <am...@amichais.net> wrote:
> >>
> >>> Hey guys,
> >>>
> >>> After experiencing some unicode trouble with HSSF today, namely
setting
> > the
> >>> cell encoding before setting it's text (or lack thereof...), I figured
> > since
> >>> Java is unicode based it's not too friendly to require users to set
> > unicode
> >>> flags explicitly in order for unicode to work. I traced the problem
down
> > to
> >>> UnicodeString.serialize(), and found some rather strange code that
seems
> > to do
> >>> nothing (decomposing and creating a string, try and catch blocks that
> > are
> >>> identical...anyone have any ideas?), and thought we should have the
> > serializer
> >>> (or perhaps the constructor?) detect the case where the String
contains
> >>> non-ascii/iso-latin chars and set the encoding automatically. This can
> > be as
> >>> simple as a
> >>>
> >>> if (str.equals(new String(str.getBytes("iso8859_1"),"iso8859_1"))) //
> > string
> >>> can be compressed
> >>> ...
> >>>
> >>> what do u say? any implications I didn't think of?
> >>>
> >>>
> >>> -Amichai
> >>>
> >>
> >> -- 
> >> Andrew C. Oliver
> >> http://www.superlinksoftware.com/poi.jsp
> >> Custom enhancements and Commercial Implementation for Jakarta POI
> >>
> >> http://jakarta.apache.org/poi
> >> For Java and Excel, Got POI?
> >>
> >> The views expressed in this email are those of the author and are
almost
> >> definitely not shared by the Apache Software Foundation, its board or
its
> >> general membership.  In fact they probably most definitively disagree
with
> >> everything espoused in the above email.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: poi-dev-help@jakarta.apache.org
> >
>
> -- 
> Andrew C. Oliver
> http://www.superlinksoftware.com/poi.jsp
> Custom enhancements and Commercial Implementation for Jakarta POI
>
> http://jakarta.apache.org/poi
> For Java and Excel, Got POI?
>
> The views expressed in this email are those of the author and are almost
> definitely not shared by the Apache Software Foundation, its board or its
> general membership.  In fact they probably most definitively disagree with
> everything espoused in the above email.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org


Re: unicode support

Posted by "Andrew C. Oliver" <ac...@apache.org>.
If you ruin it...the unit tests should fail.  :-)

On 9/3/03 5:27 PM, "A. Rothman" <am...@amichais.net> wrote:

> I had performance in mind as well, but then I saw there
> 
> String unicodeString = new
> String(getString().getBytes("Unicode"),"Unicode");
> 
> which is exactly the same performancewise, only this doesn't do anything
> (since unicode conversion doesn't lose nor gain any data in the conversion -
> it just decomposes and recomposes the string). also "Unicode" appears
> neither in the JVM required encodings list or sun supported encodings...it
> may be very JVM dependent.
> 
> I'd still like to hear if anyone knows what that code section does before I
> ruin anything :-)
> 
> ----- Original Message -----
> From: "Andrew C. Oliver" <ac...@apache.org>
> To: "POI Developers List" <po...@jakarta.apache.org>
> Sent: Wednesday, September 03, 2003 10:18 PM
> Subject: Re: unicode support
> 
> 
>> We've done detection before...its always a huge performance and memory
> hog.
>> In any case, I don't think this method has been tried.  So go for it and
> see
>> what happens.
>> 
>> -Andy
>> 
>> On 9/3/03 5:18 PM, "A. Rothman" <am...@amichais.net> wrote:
>> 
>>> Hey guys,
>>> 
>>> After experiencing some unicode trouble with HSSF today, namely setting
> the
>>> cell encoding before setting it's text (or lack thereof...), I figured
> since
>>> Java is unicode based it's not too friendly to require users to set
> unicode
>>> flags explicitly in order for unicode to work. I traced the problem down
> to
>>> UnicodeString.serialize(), and found some rather strange code that seems
> to do
>>> nothing (decomposing and creating a string, try and catch blocks that
> are
>>> identical...anyone have any ideas?), and thought we should have the
> serializer
>>> (or perhaps the constructor?) detect the case where the String contains
>>> non-ascii/iso-latin chars and set the encoding automatically. This can
> be as
>>> simple as a
>>> 
>>> if (str.equals(new String(str.getBytes("iso8859_1"),"iso8859_1"))) //
> string
>>> can be compressed
>>> ...
>>> 
>>> what do u say? any implications I didn't think of?
>>> 
>>> 
>>> -Amichai
>>> 
>> 
>> -- 
>> Andrew C. Oliver
>> http://www.superlinksoftware.com/poi.jsp
>> Custom enhancements and Commercial Implementation for Jakarta POI
>> 
>> http://jakarta.apache.org/poi
>> For Java and Excel, Got POI?
>> 
>> The views expressed in this email are those of the author and are almost
>> definitely not shared by the Apache Software Foundation, its board or its
>> general membership.  In fact they probably most definitively disagree with
>> everything espoused in the above email.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
> 

-- 
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI

http://jakarta.apache.org/poi
For Java and Excel, Got POI?

The views expressed in this email are those of the author and are almost
definitely not shared by the Apache Software Foundation, its board or its
general membership.  In fact they probably most definitively disagree with
everything espoused in the above email.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org


Re: unicode support

Posted by "A. Rothman" <am...@amichais.net>.
I had performance in mind as well, but then I saw there

String unicodeString = new
String(getString().getBytes("Unicode"),"Unicode");

which is exactly the same performancewise, only this doesn't do anything
(since unicode conversion doesn't lose nor gain any data in the conversion -
it just decomposes and recomposes the string). also "Unicode" appears
neither in the JVM required encodings list or sun supported encodings...it
may be very JVM dependent.

I'd still like to hear if anyone knows what that code section does before I
ruin anything :-)

----- Original Message ----- 
From: "Andrew C. Oliver" <ac...@apache.org>
To: "POI Developers List" <po...@jakarta.apache.org>
Sent: Wednesday, September 03, 2003 10:18 PM
Subject: Re: unicode support


> We've done detection before...its always a huge performance and memory
hog.
> In any case, I don't think this method has been tried.  So go for it and
see
> what happens.
>
> -Andy
>
> On 9/3/03 5:18 PM, "A. Rothman" <am...@amichais.net> wrote:
>
> > Hey guys,
> >
> > After experiencing some unicode trouble with HSSF today, namely setting
the
> > cell encoding before setting it's text (or lack thereof...), I figured
since
> > Java is unicode based it's not too friendly to require users to set
unicode
> > flags explicitly in order for unicode to work. I traced the problem down
to
> > UnicodeString.serialize(), and found some rather strange code that seems
to do
> > nothing (decomposing and creating a string, try and catch blocks that
are
> > identical...anyone have any ideas?), and thought we should have the
serializer
> > (or perhaps the constructor?) detect the case where the String contains
> > non-ascii/iso-latin chars and set the encoding automatically. This can
be as
> > simple as a
> >
> > if (str.equals(new String(str.getBytes("iso8859_1"),"iso8859_1"))) //
string
> > can be compressed
> > ...
> >
> > what do u say? any implications I didn't think of?
> >
> >
> > -Amichai
> >
>
> -- 
> Andrew C. Oliver
> http://www.superlinksoftware.com/poi.jsp
> Custom enhancements and Commercial Implementation for Jakarta POI
>
> http://jakarta.apache.org/poi
> For Java and Excel, Got POI?
>
> The views expressed in this email are those of the author and are almost
> definitely not shared by the Apache Software Foundation, its board or its
> general membership.  In fact they probably most definitively disagree with
> everything espoused in the above email.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org