You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Michael Zalewski <za...@optonline.net> on 2005/03/22 15:18:41 UTC
EXTSST and CONTINUE records
I don't think HSSF is handling this correctly. But I'm not sure that it causes
problems. This difference may cause problems loading large workbooks which have
many unique strings.
HSSF seems to hard code the field 'StringsPerBucket' in the EXTSST record to 8.
This means that if there are more than about 1000 unique strings, the EXTSST
record will be too large to fit in a single BIFF, so HSSF will split EXTSST
record into one or more CONTINUE records.
Excel seems to set the field to a minimum of 8, but it will increase the value
so that there are no more than 128 EXTSST entries in the EXTSST record. So when
written from Excel, the size of the EXTSST record never seems to be larger than
1026 (1030 counting the header). Each EXTSST entry is 8 bytes, and there is a 2
byte field to hold the number of strings per bucket. 2 + 8 * 128 = 1026.
So for example, if there are 65,000 unique Strings in a workbook, POI will
create a large EXTSST record, followed by approximately 7 CONTINUE records. But
Excel creates a single EXTSST record of 1026 bytes, and sets the 'strings per
bucket' field to 508. Excel writes the maximum of 128 EXTSST entries, each of
which is 8 bytes, + 2 bytes to hold the number of strings per bucket.
BTW, JExcelAPI seems to do this like Excel.
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/