You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Thamodharan Balraj <th...@gmail.com> on 2017/11/22 15:35:15 UTC

Apache POI: Content Issue while processing huge excel file with SXSSF

Hi Everyone,

I am trying to generate a excel with SXSSF workbook. The size of excel may
shoot to 200-300MB, whose row count will be 5,00,000 and column count will
be around 150 approximately..! I get content issue while opening such large
excel file.
The version which I am using is APACHE POI 3.9
I had also tried with, 3.15 the same issue exist.

System holds enough ram and disk space as per the requirement.

3.9 performance seems to better than other versions, Is there any way to
rule this issue out in that version.?

Anyone, who has idea on above scenario, kindly help me to proceed further.

Regards
Thamodharan B

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Greg Woolsey <gr...@gmail.com>.
Perhaps there are some other less obvious limits POI isn't checking.  There
is quite a list at

https://support.office.com/en-us/article/Excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

On Wed, Nov 22, 2017 at 11:51 AM Andreas Reichel <
andreas@manticore-projects.com> wrote:

>
> On Wed, 2017-11-22 at 19:39 +0000, Greg Woolsey wrote:
> > POI streaming may not catch it (it should, but maybe that's the bug),
> > but
> >
> > OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will
> > consider it
> >
> > corrupt.
>
> We have hit that limit and POI throws an exception (both in XSSF and
> SXSSF).
> The mysterious "Excel does not open" happens with files below the row
> limit, when the number of cells exceed a certain number. Not sure
> though what exactly that limit is.
>
> Cheers
>
>

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Andreas Reichel <an...@manticore-projects.com>.
On Wed, 2017-11-22 at 19:39 +0000, Greg Woolsey wrote:
> POI streaming may not catch it (it should, but maybe that's the bug),
> but
> 
> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will
> consider it
> 
> corrupt.

We have hit that limit and POI throws an exception (both in XSSF and
SXSSF).
The mysterious "Excel does not open" happens with files below the row
limit, when the number of cells exceed a certain number. Not sure
though what exactly that limit is.

Cheers


Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Thamodharan Balraj <th...@gmail.com>.
Hi Greg,
As you suggested, I had opened an issue at Bugzilla.
Bug 61832

Regards
Thamodharan B

On Nov 29, 2017 11:20 AM, "Thamodharan Balraj" <th...@gmail.com>
wrote:

Sure, will do that.


On Nov 29, 2017 4:05 AM, "Greg Woolsey" <gr...@gmail.com> wrote:

> Can you open an issue at
>
> https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI
>
> with sample code and/or a sample workbook showing the problem?  That would
> help us reproduce it and track anything that requires a code change.
>
> On Tue, Nov 28, 2017 at 3:38 AM Thamodharan Balraj <
> thamodharan247@gmail.com> wrote:
>
>> Hi Greg Woolsey,
>>
>> I am not trying to put 5 million record in a sheet, I just tried with 0.5
>> million rows I.e (500,000 rows )with 150 column.
>>
>> As you also mentioned, I had refered the limits, I hadn't
>> crossed/violated any specification limits.
>>
>> Even I didn't apply any styles or format.
>> I just iterate and write a comman data of length 25 character in all the
>> cells.
>>
>> The workbook is done successful. But while opening it, I face the content
>> issue error. Please find the screenshot attached with the mail regarding to
>> it.
>>
>> Kindly help, if any one has idea on it.
>>
>> Regards
>> Thamodharan B
>>
>>
>>
>> On Nov 23, 2017 1:09 AM, "Greg Woolsey" <gr...@gmail.com> wrote:
>>
>> The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you
>> are
>> trying to put 5 million rows on a single sheet, Excel will never open it.
>>
>> POI streaming may not catch it (it should, but maybe that's the bug), but
>> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider
>> it
>> corrupt.
>>
>> I've written out workbooks with < 1M rows in the past, without using the
>> streaming format even.  I've just had to notice when I reached the max #
>> on
>> a sheet and start a new sheet at that point.
>>
>> See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007
>> enum
>> for version specific limits.
>>
>> On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <
>> thamodharan247@gmail.com>
>> wrote:
>>
>> > Hi Everyone,
>> >
>> > I am trying to generate a excel with SXSSF workbook. The size of excel
>> may
>> > shoot to 200-300MB, whose row count will be 5,00,000 and column count
>> will
>> > be around 150 approximately..! I get content issue while opening such
>> large
>> > excel file.
>> > The version which I am using is APACHE POI 3.9
>> > I had also tried with, 3.15 the same issue exist.
>> >
>> > System holds enough ram and disk space as per the requirement.
>> >
>> > 3.9 performance seems to better than other versions, Is there any way to
>> > rule this issue out in that version.?
>> >
>> > Anyone, who has idea on above scenario, kindly help me to proceed
>> further.
>> >
>> > Regards
>> > Thamodharan B
>> >
>>
>>
>>
>>

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Thamodharan Balraj <th...@gmail.com>.
Sure, will do that.

On Nov 29, 2017 4:05 AM, "Greg Woolsey" <gr...@gmail.com> wrote:

> Can you open an issue at
>
> https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI
>
> with sample code and/or a sample workbook showing the problem?  That would
> help us reproduce it and track anything that requires a code change.
>
> On Tue, Nov 28, 2017 at 3:38 AM Thamodharan Balraj <
> thamodharan247@gmail.com> wrote:
>
>> Hi Greg Woolsey,
>>
>> I am not trying to put 5 million record in a sheet, I just tried with 0.5
>> million rows I.e (500,000 rows )with 150 column.
>>
>> As you also mentioned, I had refered the limits, I hadn't
>> crossed/violated any specification limits.
>>
>> Even I didn't apply any styles or format.
>> I just iterate and write a comman data of length 25 character in all the
>> cells.
>>
>> The workbook is done successful. But while opening it, I face the content
>> issue error. Please find the screenshot attached with the mail regarding to
>> it.
>>
>> Kindly help, if any one has idea on it.
>>
>> Regards
>> Thamodharan B
>>
>>
>>
>> On Nov 23, 2017 1:09 AM, "Greg Woolsey" <gr...@gmail.com> wrote:
>>
>> The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you
>> are
>> trying to put 5 million rows on a single sheet, Excel will never open it.
>>
>> POI streaming may not catch it (it should, but maybe that's the bug), but
>> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider
>> it
>> corrupt.
>>
>> I've written out workbooks with < 1M rows in the past, without using the
>> streaming format even.  I've just had to notice when I reached the max #
>> on
>> a sheet and start a new sheet at that point.
>>
>> See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007
>> enum
>> for version specific limits.
>>
>> On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <
>> thamodharan247@gmail.com>
>> wrote:
>>
>> > Hi Everyone,
>> >
>> > I am trying to generate a excel with SXSSF workbook. The size of excel
>> may
>> > shoot to 200-300MB, whose row count will be 5,00,000 and column count
>> will
>> > be around 150 approximately..! I get content issue while opening such
>> large
>> > excel file.
>> > The version which I am using is APACHE POI 3.9
>> > I had also tried with, 3.15 the same issue exist.
>> >
>> > System holds enough ram and disk space as per the requirement.
>> >
>> > 3.9 performance seems to better than other versions, Is there any way to
>> > rule this issue out in that version.?
>> >
>> > Anyone, who has idea on above scenario, kindly help me to proceed
>> further.
>> >
>> > Regards
>> > Thamodharan B
>> >
>>
>>
>>
>>

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Greg Woolsey <gr...@gmail.com>.
Can you open an issue at

https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI

with sample code and/or a sample workbook showing the problem?  That would
help us reproduce it and track anything that requires a code change.

On Tue, Nov 28, 2017 at 3:38 AM Thamodharan Balraj <th...@gmail.com>
wrote:

> Hi Greg Woolsey,
>
> I am not trying to put 5 million record in a sheet, I just tried with 0.5
> million rows I.e (500,000 rows )with 150 column.
>
> As you also mentioned, I had refered the limits, I hadn't crossed/violated
> any specification limits.
>
> Even I didn't apply any styles or format.
> I just iterate and write a comman data of length 25 character in all the
> cells.
>
> The workbook is done successful. But while opening it, I face the content
> issue error. Please find the screenshot attached with the mail regarding to
> it.
>
> Kindly help, if any one has idea on it.
>
> Regards
> Thamodharan B
>
>
>
> On Nov 23, 2017 1:09 AM, "Greg Woolsey" <gr...@gmail.com> wrote:
>
> The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you are
> trying to put 5 million rows on a single sheet, Excel will never open it.
>
> POI streaming may not catch it (it should, but maybe that's the bug), but
> OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider it
> corrupt.
>
> I've written out workbooks with < 1M rows in the past, without using the
> streaming format even.  I've just had to notice when I reached the max # on
> a sheet and start a new sheet at that point.
>
> See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007 enum
> for version specific limits.
>
> On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <
> thamodharan247@gmail.com>
> wrote:
>
> > Hi Everyone,
> >
> > I am trying to generate a excel with SXSSF workbook. The size of excel
> may
> > shoot to 200-300MB, whose row count will be 5,00,000 and column count
> will
> > be around 150 approximately..! I get content issue while opening such
> large
> > excel file.
> > The version which I am using is APACHE POI 3.9
> > I had also tried with, 3.15 the same issue exist.
> >
> > System holds enough ram and disk space as per the requirement.
> >
> > 3.9 performance seems to better than other versions, Is there any way to
> > rule this issue out in that version.?
> >
> > Anyone, who has idea on above scenario, kindly help me to proceed
> further.
> >
> > Regards
> > Thamodharan B
> >
>
>
>
>

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Thamodharan Balraj <th...@gmail.com>.
Hi Greg Woolsey,

I am not trying to put 5 million record in a sheet, I just tried with 0.5
million rows I.e (500,000 rows )with 150 column.

As you also mentioned, I had refered the limits, I hadn't crossed/violated
any specification limits.

Even I didn't apply any styles or format.
I just iterate and write a comman data of length 25 character in all the
cells.

The workbook is done successful. But while opening it, I face the content
issue error. Please find the screenshot attached with the mail regarding to
it.

Kindly help, if any one has idea on it.

Regards
Thamodharan B



On Nov 23, 2017 1:09 AM, "Greg Woolsey" <gr...@gmail.com> wrote:

The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you are
trying to put 5 million rows on a single sheet, Excel will never open it.

POI streaming may not catch it (it should, but maybe that's the bug), but
OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider it
corrupt.

I've written out workbooks with < 1M rows in the past, without using the
streaming format even.  I've just had to notice when I reached the max # on
a sheet and start a new sheet at that point.

See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007 enum
for version specific limits.

On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <thamodharan247@gmail.com
>
wrote:

> Hi Everyone,
>
> I am trying to generate a excel with SXSSF workbook. The size of excel may
> shoot to 200-300MB, whose row count will be 5,00,000 and column count will
> be around 150 approximately..! I get content issue while opening such
large
> excel file.
> The version which I am using is APACHE POI 3.9
> I had also tried with, 3.15 the same issue exist.
>
> System holds enough ram and disk space as per the requirement.
>
> 3.9 performance seems to better than other versions, Is there any way to
> rule this issue out in that version.?
>
> Anyone, who has idea on above scenario, kindly help me to proceed further.
>
> Regards
> Thamodharan B
>

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Greg Woolsey <gr...@gmail.com>.
The OOXML/Excel 2007+ limit is 2^20 (1,048,576) rows per sheet.  If you are
trying to put 5 million rows on a single sheet, Excel will never open it.

POI streaming may not catch it (it should, but maybe that's the bug), but
OOXML compliant apps (Excel, OpenOffice/LibreOffice, etc.) will consider it
corrupt.

I've written out workbooks with < 1M rows in the past, without using the
streaming format even.  I've just had to notice when I reached the max # on
a sheet and start a new sheet at that point.

See the properties of  org.apache.poi.ss.SpreadsheetVersion.EXCEL2007 enum
for version specific limits.

On Wed, Nov 22, 2017 at 7:35 AM Thamodharan Balraj <th...@gmail.com>
wrote:

> Hi Everyone,
>
> I am trying to generate a excel with SXSSF workbook. The size of excel may
> shoot to 200-300MB, whose row count will be 5,00,000 and column count will
> be around 150 approximately..! I get content issue while opening such large
> excel file.
> The version which I am using is APACHE POI 3.9
> I had also tried with, 3.15 the same issue exist.
>
> System holds enough ram and disk space as per the requirement.
>
> 3.9 performance seems to better than other versions, Is there any way to
> rule this issue out in that version.?
>
> Anyone, who has idea on above scenario, kindly help me to proceed further.
>
> Regards
> Thamodharan B
>

Re: Apache POI: Content Issue while processing huge excel file with SXSSF

Posted by Andreas Reichel <an...@manticore-projects.com>.
Thamodharam,

we use Apache POI 3.18 development snapshot and have no problem to
create SXSSF workbooks of 300 MByte, with up to 1 Mill. Rows and around
40 columns. On the server, that took up to 20 GB of RAM.
However, we were never able to open such files with Libre Office,
Gnumeric or Excel (while plenty of resources were available on the
client computer). The spreadsheet programs just freeze.

The interesting part is: We exactly the same framework, we are good to
create SXSSF workbooks of 100 MByte (or less) which actually open in
Libre Office and Gnumeric and Excel quite fast. 

We are not sure, if there is a limitation in the Spread Sheet programs
or Apache POI hits a problem.

Cheers


On Wed, 2017-11-22 at 21:05 +0530, Thamodharan Balraj wrote:
> Hi Everyone,
> 
> I am trying to generate a excel with SXSSF workbook. The size of excel may
> shoot to 200-300MB, whose row count will be 5,00,000 and column count will
> be around 150 approximately..! I get content issue while opening such large
> excel file.
> The version which I am using is APACHE POI 3.9
> I had also tried with, 3.15 the same issue exist.
> 
> System holds enough ram and disk space as per the requirement.
> 
> 3.9 performance seems to better than other versions, Is there any way to
> rule this issue out in that version.?
> 
> Anyone, who has idea on above scenario, kindly help me to proceed further.
> 
> Regards
> Thamodharan B