You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/09/08 21:32:29 UTC

[jira] [Commented] (TIKA-1411) Temporary 7z file leak

    [ https://issues.apache.org/jira/browse/TIKA-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125966#comment-14125966 ] 

Nick Burch commented on TIKA-1411:
----------------------------------

Any chance you could produce a patch file of your proposed fix? (It's hard to work out what needs changing from the code you've posted)

> Temporary 7z file leak
> ----------------------
>
>                 Key: TIKA-1411
>                 URL: https://issues.apache.org/jira/browse/TIKA-1411
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>
> When working with a 7z file, the created TikaInputStream is not closed inside PackageParser. Also, it is prematurely wrapping the stream into a CloseShieldInputStream, so it will never be a TikaInputStream and always wrapped into a BufferedInputStream. Proposed change:
> {code}
> public void parse(
>             InputStream stream, ContentHandler handler,
>             Metadata metadata, ParseContext context)
>             throws IOException, SAXException, TikaException {
>        
>         // Ensure that the stream supports the mark feature
>         if (! TikaInputStream.isTikaInputStream(stream))
>             stream = new BufferedInputStream(stream);
>         
>         
>         TemporaryResources tmp = new TemporaryResources();
>         ArchiveInputStream ais = null;
>         try {
>             ArchiveStreamFactory factory = context.get(ArchiveStreamFactory.class, new ArchiveStreamFactory());
>             // At the end we want to close the archive stream to release
>             // any associated resources, but the underlying document stream
>             // should not be closed
>             ais = factory.createArchiveInputStream(new CloseShieldInputStream(stream));
>             
>         } catch (StreamingNotSupportedException sne) {
>             // Most archive formats work on streams, but a few need files
>             if (sne.getFormat().equals(ArchiveStreamFactory.SEVEN_Z)) {
>                 // Rework as a file, and wrap
>                 stream.reset();
>                 TikaInputStream tstream = TikaInputStream.get(stream, tmp);
>                 
>                 // Pending a fix for COMPRESS-269, this bit is a little nasty
>                 ais = new SevenZWrapper(new SevenZFile(tstream.getFile()));
>                 
>             } else {
>             	tmp.close();
>                 throw new TikaException("Unknown non-streaming format " + sne.getFormat(), sne);
>             }
>         } catch (ArchiveException e) {
>         	tmp.close();
>             throw new TikaException("Unable to unpack document stream", e);
>         }
>         MediaType type = getMediaType(ais);
>         if (!type.equals(MediaType.OCTET_STREAM)) {
>             metadata.set(CONTENT_TYPE, type.toString());
>         }
>         // Use the delegate parser to parse the contained document
>         EmbeddedDocumentExtractor extractor = context.get(
>                 EmbeddedDocumentExtractor.class,
>                 new ParsingEmbeddedDocumentExtractor(context));
>         XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
>         xhtml.startDocument();
>         try {
>             ArchiveEntry entry = ais.getNextEntry();
>             while (entry != null) {
>                 if (!entry.isDirectory()) {
>                     parseEntry(ais, entry, extractor, xhtml);
>                 }
>                 entry = ais.getNextEntry();
>             }
>         } finally {
>             ais.close();
>             tmp.close();
>         }
>         xhtml.endDocument();
>     }
> {code}
> I would be nice if TIKA-1246 (very simple) was resolved together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)