You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Berry van Ginkel (JIRA)" <ji...@apache.org> on 2011/07/31 11:49:09 UTC

[jira] [Created] (TIKA-687) Temporary file not removed after detection

Temporary file not removed after detection
------------------------------------------

                 Key: TIKA-687
                 URL: https://issues.apache.org/jira/browse/TIKA-687
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0
         Environment: Windows
            Reporter: Berry van Ginkel


Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-687) Temporary file not removed after detection

Posted by "Berry van Ginkel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Berry van Ginkel updated TIKA-687:
----------------------------------

    Description: 
Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.

See attached patch for unittest and solution.


  was:
Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.



> Temporary file not removed after detection
> ------------------------------------------
>
>                 Key: TIKA-687
>                 URL: https://issues.apache.org/jira/browse/TIKA-687
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows
>            Reporter: Berry van Ginkel
>         Attachments: tika-temp-files.patch
>
>
> Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
> The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.
> See attached patch for unittest and solution.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-687) Temporary file not removed after detection

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095359#comment-13095359 ] 

Michael McCandless commented on TIKA-687:
-----------------------------------------

I think this may have been fixed by TIKA-701?  But it'd be nice to commit the unit tests in this patch...

> Temporary file not removed after detection
> ------------------------------------------
>
>                 Key: TIKA-687
>                 URL: https://issues.apache.org/jira/browse/TIKA-687
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows
>            Reporter: Berry van Ginkel
>         Attachments: tika-temp-files.patch
>
>
> Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
> The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.
> See attached patch for unittest and solution.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-687) Temporary file not removed after detection

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-687.
--------------------------------

    Resolution: Duplicate
      Assignee: Jukka Zitting

Right, sorry for overlooking this issue! The proposed solution is indeed included in the TIKA-701 changes, so resolving as a duplicate.

I committed the test case with slight modifications in revision 1164183. Thanks!

> Temporary file not removed after detection
> ------------------------------------------
>
>                 Key: TIKA-687
>                 URL: https://issues.apache.org/jira/browse/TIKA-687
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows
>            Reporter: Berry van Ginkel
>            Assignee: Jukka Zitting
>         Attachments: tika-temp-files.patch
>
>
> Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
> The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.
> See attached patch for unittest and solution.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-687) Temporary file not removed after detection

Posted by "Berry van Ginkel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Berry van Ginkel updated TIKA-687:
----------------------------------

    Attachment: tika-temp-files.patch

Unit test and patch for ZipContainerDetector

> Temporary file not removed after detection
> ------------------------------------------
>
>                 Key: TIKA-687
>                 URL: https://issues.apache.org/jira/browse/TIKA-687
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows
>            Reporter: Berry van Ginkel
>         Attachments: tika-temp-files.patch
>
>
> Temporary files created by Tika are not removed in the case the TikaInputStream has been created using a byte array or BufferedInputStream and using the ZipContainerDetector (in our case for Office 2007 documents).
> The fix for bug TIKA-654 solves part of the problem (when using file as input) but when the byte array is being used, TikaInputStream will create a temp file (when getFile() is called). This file will be removed when close() is called, but in the ZipDetector a ZipFile is instantiated which also opens a stream to the same temp file. This stream is not closed and therefor the file can not be deleted when TikaInputStream.close() is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira