You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2012/08/27 16:29:07 UTC
[jira] [Updated] (UIMA-2460) Binary deserialization inefficient
[ https://issues.apache.org/jira/browse/UIMA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marshall Schor updated UIMA-2460:
---------------------------------
Affects Version/s: 2.4.0SDK
> Binary deserialization inefficient
> ----------------------------------
>
> Key: UIMA-2460
> URL: https://issues.apache.org/jira/browse/UIMA-2460
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 2.4.0SDK
> Reporter: Marshall Schor
> Assignee: Marshall Schor
> Priority: Minor
> Fix For: 2.4.1SDK
>
>
> The CAS binary deserialization code can be made (much) more space efficient. Currently, the char data that is used in the strings is read into a char array; each string is represented as an offset into this char array + a length; and new Java strings are created using new String(chararray, offset, length). This works, but it allocates a new char array for each string being created, and copies from the original char array. This results in new char array objects for each string object.
> The alternative is to reuse the original char array object, and not allocate any other char array objects. This can be done by:
> * making a temporary string from the entire char array object, and then
> * making the new strings using tempString.substring(offset, offset + length)
> For 1000 strings, this will save 999 char array object overheads (probably about 16 bytes per).
> An additional space savings is possible by reusing the same string object for equal strings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira