You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2020/10/07 19:13:00 UTC

[jira] [Comment Edited] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

    [ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209786#comment-17209786 ] 

Tim Allison edited comment on TIKA-3203 at 10/7/20, 7:12 PM:
-------------------------------------------------------------

I'm running through a debugger now, and it isn't looking like we're creating a tmp file per box type at the Tika level.

I understand that it is frustrating that the MP4Parser doesn't use an existing TikaInputStream if you pass one in with your custom TemporaryResources...that is an easy fix.  I also understand that for your use case relying on {{java.io.tmpdir}} is a non-starter.

Let me look some more at the underlying mp4parser...


was (Author: tallison@mitre.org):
I'm running through a debugger now, and it isn't looking like we're creating a tmp file per box type at the Tika level.

I understand that it is frustrating that the MP4Parser doesn't use an existing TikaInputStream if you pass one in with your custom TemporaryResources.  I also understand that for your use case relying on {{java.io.tmpdir}} is a non-starter.

> MP4Parser temporary files are not deleted from Tomcat temp folder
> -----------------------------------------------------------------
>
>                 Key: TIKA-3203
>                 URL: https://issues.apache.org/jira/browse/TIKA-3203
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>         Environment: CentOS 7.8
> Tomcat webapp
> OpenJDK JRE 11.0.5
>            Reporter: Isabelle Giguere
>            Priority: Major
>
> In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates files in java.io.tmpdir.  
> The files created by the MP4Parser are never deleted from temp/.  Ex: MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
> Oddly, there are no errors in logs.  Nothing about files that cannot be deleted or not found.
> Other processes in our application needs to create other files in temp/, so we can't simply delete everything in that folder.
> I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion issues in the MP4Parser have been fixed.  This may be a little gremlin in CentOS or in Tomcat ... ?
> I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to put the parser's temporary files in a folder that we can control, but to no avail.  Tika's MP4Parser "parse" method initializes a new instance of TemporaryResources, so the TemporaryResources that I created is never used.  The default TemporaryResources would use java.io.tmpdir anyways, right?
> So, why aren't these files deleted ?
> And, while we are on the subject, there should be a way to set a temporary files folder that parsers actually use (and the parser's dependencies).  How can a user-defined TemporaryResources be useful if the parser ignores it ?
> Relevant code:
> {code}
> Parser parser = new AutoDetectParser(); // injected by Spring
> Path input = ...; // some mp4 audio file
> Path output = ...;
> final Metadata metadata = new Metadata();
> try(InputStream stream = TikaInputStream.get(input, metadata);
>     OutputStream outputstream = new FileOutputStream(output.toFile());
>     OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputstream, "UTF-8")){
> 	ParseContext parseContext = new ParseContext();
> 	
> 	parser.parse(stream, new BodyContentHandler(outputStreamWriter), metadata, parseContext);
> 	
> 	// do something with the metadata and the output
> }
> {code}
> Note that I also tried to set java.io.tmpdir to another folder, programmatically.  That had no effect either.  Since the application needs to use Tomcat's temp folder for other processing, setting java.io.tmpdir on the command line is not an option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)