You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2018/11/01 00:17:41 UTC
[Bug 62872] New: Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
Bug ID: 62872
Summary: Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed
entries.
Product: POI
Version: 4.0.0-FINAL
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: SXSSF
Assignee: dev@poi.apache.org
Reporter: andreas@manticore-projects.com
Target Milestone: ---
Created attachment 36225
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36225&action=edit
Sampel for java.io.IOException: This archive contains unclosed entries
The new 4.0 seems to have a problem when writing large XLSX files (e.g. >100k
rows but less than 1 Mill. rows) using SXSSFWorkbook:
java.io.IOException: This archive contains unclosed entries.
at
org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.finish(ZipArchiveOutputStream.java:467)
at
org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:406)
at
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:936)
Please note, small files (e.g. less than 100k rows seem to work fine) and also
everything works fine with 3.18 (same code, same data).
The attached sample reproduces the error.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #11 from PJ Fanning <fa...@yahoo.com> ---
I added a method to SXSSFWorkbook so that you can set the Zip64Mode -
https://svn.apache.org/viewvc?view=revision&revision=1848179
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #1 from andreas@manticore-projects.com ---
Created attachment 36226
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36226&action=edit
Test Case
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
andreas@manticore-projects.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andreas@manticore-projects.
| |com
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #8 from andreas@manticore-projects.com ---
LibreOffice does not seem to support ZIP64.
I have created 2 small files with the same content, but only one written as
ZIP64.
Gnumeric was able to open both files, LibreOffice fails with the ZIP64.
I have refreshed LibreOffice bug 82984 accordingly
(https://bugs.documentfoundation.org/show_bug.cgi?id=82984#c10).
Setting Zip64 to Always for archives exceeding 4GB and/or 65536 entries would
be the correct solution. When doing so, a warning related to the use of
LibreOffice would be great.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
Reproducible with 600_000 rows, not reproducible with 400_000 rows.
Temporary file is aprox. 200MB, so not a 2GB issue
When the exception is thrown in ZipArchiveOutputStream, the variables have the
following contents:
this.finished = false
this.entry = {ZipArchiveOutputStream$CurrentEntry@2501}
entry = {ZipArchiveEntry@2523} "xl/worksheets/sheet1.xml"
method = 8
size = 4497373312
internalAttributes = 0
versionRequired = 0
versionMadeBy = 0
platform = 0
rawFlag = 0
externalAttributes = 0
alignment = 0
extraFields = null
unparseableExtra = null
name = "xl/worksheets/sheet1.xml"
rawName = null
gpb = {GeneralPurposeBit@2525}
localHeaderOffset = -1
dataOffset = -1
isStreamContiguous = false
nameSource = {ZipArchiveEntry$NameSource@2526} "NAME"
commentSource = {ZipArchiveEntry$CommentSource@2527} "COMMENT"
ZipEntry.name = "xl/worksheets/sheet1.xml"
xdostime = 276176132385
mtime = null
atime = null
ctime = null
crc = 2326399640
ZipEntry.size = -1
csize = 228388641
ZipEntry.method = -1
flag = 0
extra = null
comment = null
localDataStart = 2340
dataStart = 2380
bytesRead = 4497373312
causedUseOfZip64 = false
hasWritten = false
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #15 from PJ Fanning <fa...@yahoo.com> ---
This is an old issue - not really sure what was wrong over 2 years ago.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
Dominik Stadler <do...@gmx.at> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #12 from Dominik Stadler <do...@gmx.at> ---
I think we provide as much as we can here by allowing the user to select
Zip64Mode if necessary.
Unfortunately not all tools seem to be able to handle such files, so we cannot
use this as default right now.
We may be able to switch it in the future when all the major tools have enabled
support.
Please reopen if there is still something that can be done in Apache POI here.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #10 from andreas@manticore-projects.com ---
Created attachment 36287
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36287&action=edit
ALWAYS
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #9 from andreas@manticore-projects.com ---
Created attachment 36286
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36286&action=edit
AS NEEDED
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
Andreas Beeker <ki...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |57342
Referenced Bugs:
https://bz.apache.org/bugzilla/show_bug.cgi?id=57342
[Bug 57342] Writing very large file via SXSSF leads to corrupt file
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #7 from Andreas Beeker <ki...@apache.org> ---
Created attachment 36258
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36258&action=edit
modified test
I forgot to mention, that also with POI 3.17 I couldn't produce a file which
could be opened by Libre Office ... I haven't tried with MS Excel. This also
due the double "i++" in the test, which creates more than the 1048576 logical
rows with a ROW_COUNT of 600000.
Please run your fixed test again, with POI 3.17 vs. the current trunk.
For comparison of the unzipped xmls, you might want to use the same test data.
I've attached my modified version which inserts reproducible timestamps.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #4 from Andreas Beeker <ki...@apache.org> ---
My guess is, this is a shaded exception in SXSSFWorkbook.injectData.
If you add some checkpoint variables in, it will enter the "finally" for
"zos.closeArchiveEntry()", but actually doesn't finish it.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #14 from Dominik Stadler <do...@gmx.at> ---
Ah, thanks for the note. But files created via SXSSFWorkbook seem to still work
in LibreOffice even with this, so what was making them corrupt here?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #5 from Andreas Beeker <ki...@apache.org> ---
It is a shaded exception complaining about the 4gb limit.
When using "zos.setUseZip64(Zip64Mode.Always)" the test runs through
successfully, but the result can't be opened in Libre Office
I'm now checking if the xml differs on 3.17 vs 4.0.0 or if it's caused by the
64bit zip stream.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
Dominik Stadler <do...@gmx.at> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |regression
--- Comment #2 from Dominik Stadler <do...@gmx.at> ---
Based on discussion on user-list, this worked in 3.17.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #13 from PJ Fanning <fa...@yahoo.com> ---
Dominik - Zip64Mode.Always is the default in SXSSFWorkbook as of POI 5.0.0
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62872] Writing large files with 800k rows gives
java.io.IOException: This archive contains unclosed entries.
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62872
--- Comment #6 from Andreas Beeker <ki...@apache.org> ---
Patched via r1845629
The change now implicitly sets the stream entry to 64bit based on the given
filesize, the others stream entries stay in normal mode (... I guess ...)
I've compared the sheet content in 3.17 vs trunk, but there wasn't any
differences.
As this still produces files which can't be opened in Libre/MS Office, I leave
this issue open.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org