You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2011/05/18 15:57:48 UTC

[jira] [Created] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

Prevent creating of ZipInputStreamZipEntrySource when reading files from disk
-----------------------------------------------------------------------------

                 Key: TIKA-662
                 URL: https://issues.apache.org/jira/browse/TIKA-662
             Project: Tika
          Issue Type: Improvement
            Reporter: Maxim Valyanskiy


POI provides two ways to open OPCPackage - via InputStream and via File. Creating OPCPackage from InputStream casuses creation of ZipInputStreamZipEntrySource, that buffers all uncompressed data in memory. This takes a lot of memory and it is not needed when we are reading files from disk or when we already copied stream into temporary file.

This patch removes usage of ZipInputStreamZipEntrySource in this case.

Unfortunately, it breaks ZIP-bomb prevention for OOXML parser (and other parsers that uses TikaInputStream.getFile()). I think that ZIP-bomb prevention should be additionally implemented for that formats before committing this to SVN.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

Posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maxim Valyanskiy resolved TIKA-662.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

This issue almost duplicates TIKA-645. The rest few lines is commited in r1124577.

> Prevent creating of ZipInputStreamZipEntrySource when reading files from disk
> -----------------------------------------------------------------------------
>
>                 Key: TIKA-662
>                 URL: https://issues.apache.org/jira/browse/TIKA-662
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Maxim Valyanskiy
>             Fix For: 1.0
>
>         Attachments: TIKA-662.patch
>
>
> POI provides two ways to open OPCPackage - via InputStream and via File. Creating OPCPackage from InputStream casuses creation of ZipInputStreamZipEntrySource, that buffers all uncompressed data in memory. This takes a lot of memory and it is not needed when we are reading files from disk or when we already copied stream into temporary file.
> This patch removes usage of ZipInputStreamZipEntrySource in this case.
> Unfortunately, it breaks ZIP-bomb prevention for OOXML parser (and other parsers that uses TikaInputStream.getFile()). I think that ZIP-bomb prevention should be additionally implemented for that formats before committing this to SVN.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

Posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maxim Valyanskiy updated TIKA-662:
----------------------------------

    Attachment: TIKA-662.patch

patch

> Prevent creating of ZipInputStreamZipEntrySource when reading files from disk
> -----------------------------------------------------------------------------
>
>                 Key: TIKA-662
>                 URL: https://issues.apache.org/jira/browse/TIKA-662
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Maxim Valyanskiy
>         Attachments: TIKA-662.patch
>
>
> POI provides two ways to open OPCPackage - via InputStream and via File. Creating OPCPackage from InputStream casuses creation of ZipInputStreamZipEntrySource, that buffers all uncompressed data in memory. This takes a lot of memory and it is not needed when we are reading files from disk or when we already copied stream into temporary file.
> This patch removes usage of ZipInputStreamZipEntrySource in this case.
> Unfortunately, it breaks ZIP-bomb prevention for OOXML parser (and other parsers that uses TikaInputStream.getFile()). I think that ZIP-bomb prevention should be additionally implemented for that formats before committing this to SVN.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira