You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Jason Height <jh...@minotaur.apache.org> on 2005/08/24 05:43:38 UTC
String encoding (again)
All,
Any idea why the following line from UnicodeRecord (current HEAD rev and
previous) is actually required?
String unicodeString = new
String(getString().getBytes("Unicode"),"Unicode");
If i remove it and use:
String unicodeString = getString();
1) All of the unit tests still pass, and
2) There is a 33x performance improvement with workbooks containing a
large numbers of strings
I am tempted to apply a patch to use my approach. Any
objections?
Jason
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
Re: String encoding (again)
Posted by Glen Stampoultzis <gs...@iinet.net.au>.
The getBytes() call with no argument uses the default character set but
the line in the email was using the version of get bytes that explicitly
specifies unicode as the character set. From what I can tell the code
is converting from a unicode string into a unicode byte array and
slurping it back up into a new unicode string. Net effect is that it is
doing a lot of work for no actual reason.
-- glen
acoliver@jboss.org wrote:
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#getBytes()
>
> public byte[] getBytes()
>
> Encodes this String into a sequence of bytes using the platform's
> default charset, storing the result into a new byte array.
>
> The behavior of this method when this string cannot be encoded in
> the default charset is unspecified. The CharsetEncoder class should be
> used when more control over the encoding process is required.
>
> Returns:
> The resultant byte array
> Since:
> JDK1.1
>
>
>
> Glen Stampoultzis wrote:
>
>>
>> Aren't Java strings always stored as 2 byte unicode as defined by the
>> spec?
>>
>> acoliver@apache.org wrote:
>>
>>> Not all systems default to unicode. Though that looks doofy to me.
>>> Your code assumes they do. You'd need a flag saying
>>> "amIOnAnAS400()" or something ;-)
>>>
>>> -Andy
>>>
>>> Jason Height wrote:
>>>
>>>> All,
>>>>
>>>> Any idea why the following line from UnicodeRecord (current HEAD
>>>> rev and previous) is actually required?
>>>> String unicodeString = new
>>>> String(getString().getBytes("Unicode"),"Unicode");
>>>>
>>>> If i remove it and use:
>>>> String unicodeString = getString();
>>>>
>>>> 1) All of the unit tests still pass, and
>>>> 2) There is a 33x performance improvement with workbooks containing
>>>> a large numbers of strings
>>>>
>>>> I am tempted to apply a patch to use my approach. Any objections?
>>>>
>>>> Jason
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>>>> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
>>>> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>>>>
>>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
>> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
Re: String encoding (again)
Posted by ac...@jboss.org.
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#getBytes()
public byte[] getBytes()
Encodes this String into a sequence of bytes using the platform's
default charset, storing the result into a new byte array.
The behavior of this method when this string cannot be encoded in
the default charset is unspecified. The CharsetEncoder class should be
used when more control over the encoding process is required.
Returns:
The resultant byte array
Since:
JDK1.1
Glen Stampoultzis wrote:
>
> Aren't Java strings always stored as 2 byte unicode as defined by the spec?
>
> acoliver@apache.org wrote:
>
>> Not all systems default to unicode. Though that looks doofy to me.
>> Your code assumes they do. You'd need a flag saying "amIOnAnAS400()"
>> or something ;-)
>>
>> -Andy
>>
>> Jason Height wrote:
>>
>>> All,
>>>
>>> Any idea why the following line from UnicodeRecord (current HEAD rev
>>> and previous) is actually required?
>>> String unicodeString = new
>>> String(getString().getBytes("Unicode"),"Unicode");
>>>
>>> If i remove it and use:
>>> String unicodeString = getString();
>>>
>>> 1) All of the unit tests still pass, and
>>> 2) There is a 33x performance improvement with workbooks containing a
>>> large numbers of strings
>>>
>>> I am tempted to apply a patch to use my approach. Any objections?
>>>
>>> Jason
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>>> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
>>> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
Re: String encoding (again)
Posted by Glen Stampoultzis <gs...@iinet.net.au>.
Aren't Java strings always stored as 2 byte unicode as defined by the spec?
acoliver@apache.org wrote:
> Not all systems default to unicode. Though that looks doofy to me.
> Your code assumes they do. You'd need a flag saying "amIOnAnAS400()"
> or something ;-)
>
> -Andy
>
> Jason Height wrote:
>
>> All,
>>
>> Any idea why the following line from UnicodeRecord (current HEAD rev
>> and previous) is actually required?
>> String unicodeString = new
>> String(getString().getBytes("Unicode"),"Unicode");
>>
>> If i remove it and use:
>> String unicodeString = getString();
>>
>> 1) All of the unit tests still pass, and
>> 2) There is a 33x performance improvement with workbooks containing a
>> large numbers of strings
>>
>> I am tempted to apply a patch to use my approach. Any objections?
>>
>> Jason
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
>> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
Re: String encoding (again)
Posted by ac...@apache.org.
Not all systems default to unicode. Though that looks doofy to me.
Your code assumes they do. You'd need a flag saying "amIOnAnAS400()" or
something ;-)
-Andy
Jason Height wrote:
> All,
>
> Any idea why the following line from UnicodeRecord (current HEAD rev and
> previous) is actually required?
> String unicodeString = new
> String(getString().getBytes("Unicode"),"Unicode");
>
> If i remove it and use:
> String unicodeString = getString();
>
> 1) All of the unit tests still pass, and
> 2) There is a 33x performance improvement with workbooks containing a
> large numbers of strings
>
> I am tempted to apply a patch to use my approach. Any objections?
>
> Jason
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>
>
--
Andrew C. Oliver
SuperLink Software, Inc.
Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/