You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Michael Zalewski <za...@optonline.net> on 2005/03/22 15:18:41 UTC

EXTSST and CONTINUE records

I don't think HSSF is handling this correctly. But I'm not sure that it causes 
problems. This difference may cause problems loading large workbooks which have 
many unique strings.

HSSF seems to hard code the field 'StringsPerBucket' in the EXTSST record to 8. 
This means that if there are more than about 1000 unique strings, the EXTSST 
record will be too large to fit in a single BIFF, so HSSF will split EXTSST 
record into one or more CONTINUE records.

Excel seems to set the field to a minimum of 8, but it will increase the value 
so that there are no more than 128 EXTSST entries in the EXTSST record. So when 
written from Excel, the size of the EXTSST record never seems to be larger than 
1026 (1030 counting the header). Each EXTSST entry is 8 bytes, and there is a 2 
byte field to hold the number of strings per bucket. 2 + 8 * 128 = 1026.

So for example, if there are 65,000 unique Strings in a workbook, POI will 
create a large EXTSST record, followed by approximately 7 CONTINUE records. But 
Excel creates a single EXTSST record of 1026 bytes, and sets the 'strings per 
bucket' field to 508. Excel writes the maximum of 128 EXTSST entries, each of 
which is 8 bytes, + 2 bytes to hold the number of strings per bucket.

BTW, JExcelAPI seems to do this like Excel.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/