You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2012/03/27 10:27:31 UTC

DO NOT REPLY [Bug 52991] New: Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

             Bug #: 52991
           Summary: Unexpected end of ZLIB input stream on embedded OLE
                    extraction from PPT
           Product: POI
           Version: 3.8-dev
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSLF
        AssignedTo: dev@poi.apache.org
        ReportedBy: max.valjanski@gmail.com
    Classification: Unclassified


Caused by: org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input
stream
    at
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
    at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103)
    at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at org.apache.tika.io.IOUtils.copyLarge(IOUtils.java:933)
    at org.apache.tika.io.IOUtils.copy(IOUtils.java:907)
    at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:536)
    at
org.apache.tika.io.TikaInputStream.getFileChannel(TikaInputStream.java:564)
    at
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:335)
    at
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:152)
    at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
    at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
    at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
    at
org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:68)
    at
org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:210)
    at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:122)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:188)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 5 more
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
    ... 26 more

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

--- Comment #5 from EM <eu...@kontextwork.de> ---
i used: 
svn checkout https://svn.apache.org/repos/asf/tika/trunk tika.trunk

then "mvn install". Not sure about POI, is that an extra lib? Does maven not
fetch it properly / is it not included into the source?

Should i build it again and provide you the logs on a pastebin?

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

Maxim Valyanskiy <ma...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #1 from Maxim Valyanskiy <ma...@gmail.com> 2012-03-27 08:32:57 UTC ---
fixed in r1305778

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

--- Comment #4 from Maxim Valyanskiy <ma...@gmail.com> ---
Are you sure that yours Tika is build with latest version of POI? Stacktrace
looks like it was produced by build without my fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

--- Comment #3 from EM <eu...@kontextwork.de> ---
Just researched a bit on the net, several people running into this because of
"broken archives". I verified that i can unzip the ppt without issues using
unzip.

> unzip <file.ppt>
Archive:  <file.ppt> 
Warning[<file.ppt>]: 1865926 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: [Content_Types].xml     
  inflating: _rels/.rels             
  inflating: drs/picturexml.xml      
  inflating: drs/_rels/picturexml.xml.rels  
  inflating: drs/downrev.xml         
 extracting: drs/media/image1.png    

Iam not sure, if the warning is one of those issues or related, just wanted to
provide the information i got. Bascially this happens with a lot of ppts
here..we are using tika+solr to index attachments

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

Andreas Beeker <an...@gmx.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Andreas Beeker <an...@gmx.de> ---
I assume this is fixed, when you reopen it, please attach a test file.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

Eduard Dudar <ed...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eduard.dudar@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

--- Comment #6 from Maxim Valyanskiy <ma...@gmail.com> ---
This is bugzilla of POI project, not Tika :-) 

Tika uses POI as dependency in tika-parsers module. Bug was fixed in unreleased
version of POI, so you need to build your own version (or wait for next
release).

If you want to build Tika with POI, then:

1) Build POI

2) Install POI artifacts to you local maven repository

3) Update POI version in tika-parsers/pom.xml

4) Build Tika

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

AndreU <an...@kontextwork.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andre.ulrich@kontextwork.de

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

EM <eu...@kontextwork.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eugen.mayer@kontextwork.de

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

--- Comment #7 from EM <eu...@kontextwork.de> ---
oh holy, iam sorry! Totally got confused here, due to rather "general"
bugtracking system.

Will do what you suggested, thank you a lot!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 52991] Unexpected end of ZLIB input stream on embedded OLE extraction from PPT

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52991

RM <eu...@kontextwork.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #2 from RM <eu...@kontextwork.de> ---
Verified on with the current trunk, revision 1337825, not fixed yet:

The source is a ppt, error is exactly the same:
xception in thread "main" org.apache.tika.exception.TikaException: TIKA-198:
Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@bd928a
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:248)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:126)
    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:395)
    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:97)
Caused by: org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input
stream
    at
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
    at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103)
    at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    at java.io.FilterInputStream.read(FilterInputStream.java:90)
    at org.apache.tika.io.IOUtils.copyLarge(IOUtils.java:933)
    at org.apache.tika.io.IOUtils.copy(IOUtils.java:907)
    at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:536)
    at
org.apache.tika.io.TikaInputStream.getFileChannel(TikaInputStream.java:564)
    at
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:335)
    at
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:152)
    at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
    at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
    at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
    at
org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:68)
    at
org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:236)
    at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:117)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:188)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 5 more
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
    ... 26 more


---------
Debian Squeeze with tika from source ( also tried 1.0 and 1.1 )

-- 
You are receiving this mail because:
You are the assignee for the bug.