You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Cyril.Chang (張志遠)" <Cy...@infortrend.com> on 2009/11/19 08:14:34 UTC

HSSF output excel file size issue

Issue Description :
We Create an Excel File Manually with 218X1000 Cells say test_2.xls
And then generate the same data via POI HSSF to output another file say test_2_poi.xls
The file size
test_2.xls : 1335 KB
test_2_poi.xls : 3887 KB (is more than double size of test_2.xls)

Environment : 
OS : Linux Fedora 3
Web Server : Apache 1.3
PHP Version : 4.3.12-dev
PHP Java Bridge : 2.0.7
MYSQL Version : 5.0.9
JAVA : JDK_1.6.0_17
POI Version : poi-3.5-FINAL-20090928

Duplicate Procedure :
1.	Open MS Excel (My Version is MS Excel 2007) , Generate Random Number data with 218X1000 Cells , each cell is set to "RANDBETWEEN(1,10000000)"
2.	Copy All Data and Paste these data (worksheet1) to another worksheet2 with value (not formula)
3.	Delete other work sheet and preserve this worksheet2 only
4.	Save this file for Excel 97~2003 Version say test_2.xls , and Save another copy for csv format say test_2.csv
5.	Upload test_2.csv to Web Server root
6.	Run PHP Script(Please see the code below) to generate excel file say test_2_poi.xls
 
 Code :
 <?php

 	java_require("/usr/local/cyril/java/poi-3.5-FINAL-20090928.jar");  //directory of POI Package
   $workbook = new java("org.apache.poi.hssf.usermodel.HSSFWorkbook");
	$sheet = $workbook->createSheet("Test");
	$data_array = array();
	$fh = fopen("test_2.csv","r");
	while($tmp_data=fgetcsv($fh,100000000,","))
	{
		array_push($data_array,$tmp_data);
	}
	
	for($i=0;$i<count($data_array);$i++)
	{
		$row = $sheet->createRow($i);
		for($j=0;$j<count($data_array[$i]);$j++)
		{
			$cell = $row->createCell($j);
			$cell->setCellValue((float)$data_array[$i][$j]);
		}
	}
	$memoryStream = new Java('java.io.FileOutputStream','/usr/local/apache/htdocs/test2_poi.xls'); //Save to Web Server Root
	$workbook->write($memoryStream);
   
 ?php>

We Want to Know if there is any solution to avoid the doubled file size issue which is generated from POI HSSF ???



Re: HSSF output excel file size issue

Posted by Nick Burch <ni...@torchbox.com>.
On Thu, 19 Nov 2009, Cyril.Chang wrote:
> Issue Description : We Create an Excel File Manually with 218X1000 Cells 
> say test_2.xls And then generate the same data via POI HSSF to output 
> another file say test_2_poi.xls The file size test_2.xls : 1335 KB 
> test_2_poi.xls : 3887 KB (is more than double size of test_2.xls)

If your document contains lots of small numbers, this is to be expected. 
I suspect Excel is making using of MulBlank and MulRK records, which 
compress the data for several adjacent cells into one record. POI doesn't 
currently. You can probably buy a lot of 2TB disks for the cost of 
sponsoring the development to add the compression feature to poi...

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org