You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2012/03/27 10:27:31 UTC
DO NOT REPLY [Bug 52991] New: Unexpected end of ZLIB input stream on
embedded OLE extraction from PPT
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
Bug #: 52991
Summary: Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Product: POI
Version: 3.8-dev
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: HSLF
AssignedTo: dev@poi.apache.org
ReportedBy: max.valjanski@gmail.com
Classification: Unclassified
Caused by: org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input
stream
at
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103)
at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.tika.io.IOUtils.copyLarge(IOUtils.java:933)
at org.apache.tika.io.IOUtils.copy(IOUtils.java:907)
at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:536)
at
org.apache.tika.io.TikaInputStream.getFileChannel(TikaInputStream.java:564)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:335)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:152)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
at
org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:68)
at
org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:210)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:122)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:188)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 5 more
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
... 26 more
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
--- Comment #5 from EM <eu...@kontextwork.de> ---
i used:
svn checkout https://svn.apache.org/repos/asf/tika/trunk tika.trunk
then "mvn install". Not sure about POI, is that an extra lib? Does maven not
fetch it properly / is it not included into the source?
Should i build it again and provide you the logs on a pastebin?
--
You are receiving this mail because:
You are the assignee for the bug.
DO NOT REPLY [Bug 52991] Unexpected end of ZLIB input stream on
embedded OLE extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
Maxim Valyanskiy <ma...@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #1 from Maxim Valyanskiy <ma...@gmail.com> 2012-03-27 08:32:57 UTC ---
fixed in r1305778
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
--- Comment #4 from Maxim Valyanskiy <ma...@gmail.com> ---
Are you sure that yours Tika is build with latest version of POI? Stacktrace
looks like it was produced by build without my fix.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
--- Comment #3 from EM <eu...@kontextwork.de> ---
Just researched a bit on the net, several people running into this because of
"broken archives". I verified that i can unzip the ppt without issues using
unzip.
> unzip <file.ppt>
Archive: <file.ppt>
Warning[<file.ppt>]: 1865926 extra bytes at beginning or within zipfile
(attempting to process anyway)
inflating: [Content_Types].xml
inflating: _rels/.rels
inflating: drs/picturexml.xml
inflating: drs/_rels/picturexml.xml.rels
inflating: drs/downrev.xml
extracting: drs/media/image1.png
Iam not sure, if the warning is one of those issues or related, just wanted to
provide the information i got. Bascially this happens with a lot of ppts
here..we are using tika+solr to index attachments
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
Andreas Beeker <an...@gmx.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution|--- |FIXED
--- Comment #8 from Andreas Beeker <an...@gmx.de> ---
I assume this is fixed, when you reopen it, please attach a test file.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
Eduard Dudar <ed...@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |eduard.dudar@gmail.com
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
--- Comment #6 from Maxim Valyanskiy <ma...@gmail.com> ---
This is bugzilla of POI project, not Tika :-)
Tika uses POI as dependency in tika-parsers module. Bug was fixed in unreleased
version of POI, so you need to build your own version (or wait for next
release).
If you want to build Tika with POI, then:
1) Build POI
2) Install POI artifacts to you local maven repository
3) Update POI version in tika-parsers/pom.xml
4) Build Tika
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
AndreU <an...@kontextwork.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andre.ulrich@kontextwork.de
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
EM <eu...@kontextwork.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |eugen.mayer@kontextwork.de
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
--- Comment #7 from EM <eu...@kontextwork.de> ---
oh holy, iam sorry! Totally got confused here, due to rather "general"
bugtracking system.
Will do what you suggested, thank you a lot!
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE
extraction from PPT
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991
RM <eu...@kontextwork.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |---
--- Comment #2 from RM <eu...@kontextwork.de> ---
Verified on with the current trunk, revision 1337825, not fixed yet:
The source is a ppt, error is exactly the same:
xception in thread "main" org.apache.tika.exception.TikaException: TIKA-198:
Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@bd928a
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:248)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:126)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:395)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:97)
Caused by: org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input
stream
at
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103)
at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at org.apache.tika.io.IOUtils.copyLarge(IOUtils.java:933)
at org.apache.tika.io.IOUtils.copy(IOUtils.java:907)
at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:536)
at
org.apache.tika.io.TikaInputStream.getFileChannel(TikaInputStream.java:564)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:335)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:152)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
at
org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:68)
at
org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:236)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:117)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:188)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 5 more
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
... 26 more
---------
Debian Squeeze with tika from source ( also tried 1.0 and 1.1 )
--
You are receiving this mail because:
You are the assignee for the bug.