You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2018/11/01 00:17:41 UTC

[Bug 62872] New: Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

            Bug ID: 62872
           Summary: Writing large files with 800k rows gives
                    java.io.IOException: This archive contains unclosed
                    entries.
           Product: POI
           Version: 4.0.0-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SXSSF
          Assignee: dev@poi.apache.org
          Reporter: andreas@manticore-projects.com
  Target Milestone: ---

Created attachment 36225
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36225&action=edit
Sampel for java.io.IOException: This archive contains unclosed entries

The new 4.0 seems to have a problem when writing large XLSX files (e.g. >100k
rows but less than 1 Mill. rows) using SXSSFWorkbook:

java.io.IOException: This archive contains unclosed entries.
        at
org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.finish(ZipArchiveOutputStream.java:467)
        at
org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:406)
        at
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:936)

Please note, small files (e.g. less than 100k rows seem to work fine) and also
everything works fine with 3.18 (same code, same data).

The attached sample reproduces the error.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #11 from PJ Fanning <fa...@yahoo.com> ---
I added a method to SXSSFWorkbook so that you can set the Zip64Mode -
https://svn.apache.org/viewvc?view=revision&revision=1848179

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #1 from andreas@manticore-projects.com ---
Created attachment 36226
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36226&action=edit
Test Case

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

andreas@manticore-projects.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andreas@manticore-projects.
                   |                            |com

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #8 from andreas@manticore-projects.com ---
LibreOffice does not seem to support ZIP64.

I have created 2 small files with the same content, but only one written as
ZIP64.

Gnumeric was able to open both files, LibreOffice fails with the ZIP64.
I have refreshed LibreOffice bug 82984 accordingly
(https://bugs.documentfoundation.org/show_bug.cgi?id=82984#c10).

Setting Zip64 to Always for archives exceeding 4GB and/or 65536 entries would
be the correct solution. When doing so, a warning related to the use of
LibreOffice would be great.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
Reproducible with 600_000 rows, not reproducible with 400_000 rows. 

Temporary file is aprox. 200MB, so not a 2GB issue

When the exception is thrown in ZipArchiveOutputStream, the variables have the
following contents:

this.finished = false
this.entry = {ZipArchiveOutputStream$CurrentEntry@2501} 
 entry = {ZipArchiveEntry@2523} "xl/worksheets/sheet1.xml"
  method = 8
  size = 4497373312
  internalAttributes = 0
  versionRequired = 0
  versionMadeBy = 0
  platform = 0
  rawFlag = 0
  externalAttributes = 0
  alignment = 0
  extraFields = null
  unparseableExtra = null
  name = "xl/worksheets/sheet1.xml"
  rawName = null
  gpb = {GeneralPurposeBit@2525} 
  localHeaderOffset = -1
  dataOffset = -1
  isStreamContiguous = false
  nameSource = {ZipArchiveEntry$NameSource@2526} "NAME"
  commentSource = {ZipArchiveEntry$CommentSource@2527} "COMMENT"
  ZipEntry.name = "xl/worksheets/sheet1.xml"
  xdostime = 276176132385
  mtime = null
  atime = null
  ctime = null
  crc = 2326399640
  ZipEntry.size = -1
  csize = 228388641
  ZipEntry.method = -1
  flag = 0
  extra = null
  comment = null
 localDataStart = 2340
 dataStart = 2380
 bytesRead = 4497373312
 causedUseOfZip64 = false
 hasWritten = false

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #15 from PJ Fanning <fa...@yahoo.com> ---
This is an old issue - not really sure what was wrong over 2 years ago.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #12 from Dominik Stadler <do...@gmx.at> ---
I think we provide as much as we can here by allowing the user to select
Zip64Mode if necessary.

Unfortunately not all tools seem to be able to handle such files, so we cannot
use this as default right now. 

We may be able to switch it in the future when all the major tools have enabled
support.

Please reopen if there is still something that can be done in Apache POI here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #10 from andreas@manticore-projects.com ---
Created attachment 36287
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36287&action=edit
ALWAYS

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #9 from andreas@manticore-projects.com ---
Created attachment 36286
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36286&action=edit
AS NEEDED

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

Andreas Beeker <ki...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |57342


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=57342
[Bug 57342] Writing very large file via SXSSF leads to corrupt file
-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #7 from Andreas Beeker <ki...@apache.org> ---
Created attachment 36258
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36258&action=edit
modified test

I forgot to mention, that also with POI 3.17 I couldn't produce a file which
could be opened by Libre Office ... I haven't tried with MS Excel. This also
due the double "i++" in the test, which creates more than the 1048576 logical
rows with a ROW_COUNT of 600000.

Please run your fixed test again, with POI 3.17 vs. the current trunk.

For comparison of the unzipped xmls, you might want to use the same test data.
I've attached my modified version which inserts reproducible timestamps.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #4 from Andreas Beeker <ki...@apache.org> ---
My guess is, this is a shaded exception in SXSSFWorkbook.injectData.
If you add some checkpoint variables in, it will enter the "finally" for
"zos.closeArchiveEntry()", but actually doesn't finish it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #14 from Dominik Stadler <do...@gmx.at> ---
Ah, thanks for the note. But files created via SXSSFWorkbook seem to still work
in LibreOffice even with this, so what was making them corrupt here?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #5 from Andreas Beeker <ki...@apache.org> ---
It is a shaded exception complaining about the 4gb limit.
When using "zos.setUseZip64(Zip64Mode.Always)" the test runs through
successfully, but the result can't be opened in Libre Office

I'm now checking if the xml differs on 3.17 vs 4.0.0 or if it's caused by the
64bit zip stream.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |regression

--- Comment #2 from Dominik Stadler <do...@gmx.at> ---
Based on discussion on user-list, this worked in 3.17.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #13 from PJ Fanning <fa...@yahoo.com> ---
Dominik - Zip64Mode.Always is the default in SXSSFWorkbook as of POI 5.0.0

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62872] Writing large files with 800k rows gives java.io.IOException: This archive contains unclosed entries.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872

--- Comment #6 from Andreas Beeker <ki...@apache.org> ---
Patched via r1845629

The change now implicitly sets the stream entry to 64bit based on the given
filesize, the others stream entries stay in normal mode (... I guess ...)

I've compared the sheet content in 3.17 vs trunk, but there wasn't any
differences.

As this still produces files which can't be opened in Libre/MS Office, I leave
this issue open.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org