You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by thedrs <th...@gmail.com> on 2016/01/14 13:11:23 UTC

Re: How to get row count using SAX/XLSX

I also needed this for determining whether to use SAX or DOM API for a file
according to it size.
In SAX model you scan the actual XML so i unzipped a large xslx and opened
the XML and looked for header tags that can tell me about the real size
(rows / cols) of the xls without reading the whole xlsx.


I fount that one of the first tags is the dimension tag (e.g. <dimension
ref="A1:BK28674"/>) which can give you the amount of cols and rows. Sadly
the sax xml parser that you use (that implements SheetContentsHandler)
doesn't allow you to override a function that checks each tag (only cells
and row start and end and headers... i don't know what headers are but it is
not what i am looking for).

So, you are left with 2 choices:
1. open the xlsx zip programatically, search for the sheet xml, read it
using an XML SAX, find the above tag at the beginning and close it.

2. find the unzipped size of the xlsx and use that as a rough estimate.

I chose the later,
here is the code:

private boolean isLargeExcel(String excelFile) {
        final int LARGE_EXCEL_THRESHOLD = 40 * 1024 * 1024; // 40MB
        ZipInputStream zin=null;
        float expandedSize = 0;
        try {
	        FileInputStream fin = new FileInputStream(excelFile);
	        zin = new ZipInputStream(fin);
	        ZipEntry ze = null;
			while ((ze = zin.getNextEntry()) != null) 
					expandedSize+=ze.getSize();
	        } catch (Exception x) {
				return false;
			} finally {
	        	try {
					if (zin != null)
						zin.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
        }
        return (expandedSize > LARGE_EXCEL_THRESHOLD);
}




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-get-row-count-using-SAX-XLSX-tp5529644p5721596.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to get row count using SAX/XLSX

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

Did you take a look at the sample XLSX2CSV at
https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java,
it shows how you can use the class XSSFReader from POI and a small
implementation of the interface SheetContentsHandler to have a SAX-like
application for an XLSX file. With this you just need a simple counter that
increases for every startRow() callback and thus you would not need to
handle all the unzipping/XML parsing yourself.

This should also work for very large files as they are read in a streaming
fashion without holding all of it in memory.

Dominik.

On Thu, Jan 14, 2016 at 1:11 PM, thedrs <th...@gmail.com> wrote:

> I also needed this for determining whether to use SAX or DOM API for a file
> according to it size.
> In SAX model you scan the actual XML so i unzipped a large xslx and opened
> the XML and looked for header tags that can tell me about the real size
> (rows / cols) of the xls without reading the whole xlsx.
>
>
> I fount that one of the first tags is the dimension tag (e.g. <dimension
> ref="A1:BK28674"/>) which can give you the amount of cols and rows. Sadly
> the sax xml parser that you use (that implements SheetContentsHandler)
> doesn't allow you to override a function that checks each tag (only cells
> and row start and end and headers... i don't know what headers are but it
> is
> not what i am looking for).
>
> So, you are left with 2 choices:
> 1. open the xlsx zip programatically, search for the sheet xml, read it
> using an XML SAX, find the above tag at the beginning and close it.
>
> 2. find the unzipped size of the xlsx and use that as a rough estimate.
>
> I chose the later,
> here is the code:
>
> private boolean isLargeExcel(String excelFile) {
>         final int LARGE_EXCEL_THRESHOLD = 40 * 1024 * 1024; // 40MB
>         ZipInputStream zin=null;
>         float expandedSize = 0;
>         try {
>                 FileInputStream fin = new FileInputStream(excelFile);
>                 zin = new ZipInputStream(fin);
>                 ZipEntry ze = null;
>                         while ((ze = zin.getNextEntry()) != null)
>                                         expandedSize+=ze.getSize();
>                 } catch (Exception x) {
>                                 return false;
>                         } finally {
>                         try {
>                                         if (zin != null)
>                                                 zin.close();
>                                 } catch (IOException e) {
>                                         e.printStackTrace();
>                                 }
>         }
>         return (expandedSize > LARGE_EXCEL_THRESHOLD);
> }
>
>
>
>
> --
> View this message in context:
> http://apache-poi.1045710.n5.nabble.com/How-to-get-row-count-using-SAX-XLSX-tp5529644p5721596.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>