You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Isabelle Giguere (Jira)" <ji...@apache.org> on 2020/09/22 21:36:00 UTC
[jira] [Updated] (TIKA-3203) MP4Parser temporary files are not
deleted from Tomcat temp folder
[ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Isabelle Giguere updated TIKA-3203:
-----------------------------------
Description:
In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir". The MP4Parser creates files in java.io.tmpdir.
The files created by the MP4Parser are never deleted from temp/. Ex: MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
Oddly, there are no errors in logs. Nothing about files that cannot be deleted or not found.
Other processes in our application needs to create other files in temp/, so we can't simply delete everything in that folder.
I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion issues in the MP4Parser have been fixed. This may be a little gremlin in CentOS or in Tomcat ... ?
I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to put the parser's temporary files in a folder that we can control, but to no avail. Tika's MP4Parser "parse" method initializes a new instance of TemporaryResources, so the TemporaryResources that I created is never used. The default TemporaryResources would use java.io.tmpdir anyways, right?
So, why aren't these files deleted ?
And, while we are on the subject, there should be a way to set a temporary files folder that parsers actually use (and the parser's dependencies). How can a user-defined TemporaryResources be useful if the parser ignores it ?
Relevant code:
{code}
Parser parser = new AutoDetectParser(); // injected by Spring
Path input = ...; // some mp4 audio file
Path output = ...;
final Metadata metadata = new Metadata();
try(InputStream stream = TikaInputStream.get(input, metadata);
OutputStream outputstream = new FileOutputStream(output.toFile());
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputstream, "UTF-8")){
ParseContext parseContext = new ParseContext();
parser.parse(stream, new BodyContentHandler(outputStreamWriter), metadata, parseContext);
// do something with the metadata and the output
}
{code}
was:
In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir". The MP4Parser creates files in java.io.tmpdir.
The files created by the MP4Parser are never deleted from temp/. Ex: MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
Oddly, there are no errors in logs. Nothing about files that cannot be deleted or not found.
Other processes in our application needs to create other files in temp/, so we can't simply delete everything in that folder.
I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion issues in the MP4Parser have been fixed. This may be a little gremlin in CentOS or in Tomcat ... ?
I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to put the parser's temporary files in a folder that we can control, but to no avail. Tika's MP4Parser "parse" method initializes a new instance of TemporaryResources, so the TemporaryResources that I created is never used. The default TemporaryResources would use java.io.tmpdir anyways, right?
So, why aren't these files deleted ?
And, while we are on the subject, there should be a way to set a temporary files folder that parsers actually use (and the parser's dependencies). How can a user-defined TemporaryResources be useful if the parser ignores it ?
Relevant code:
Parser parser = new AutoDetectParser(); // injected by Spring
Path input = ...; // some mp4 audio file
Path output = ...;
final Metadata metadata = new Metadata();
try(InputStream stream = TikaInputStream.get(input, metadata);
OutputStream outputstream = new FileOutputStream(output.toFile());
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputstream, "UTF-8")){
ParseContext parseContext = new ParseContext();
parser.parse(stream, new BodyContentHandler(outputStreamWriter), metadata, parseContext);
// do something with the metadata and the output
}
> MP4Parser temporary files are not deleted from Tomcat temp folder
> -----------------------------------------------------------------
>
> Key: TIKA-3203
> URL: https://issues.apache.org/jira/browse/TIKA-3203
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.24.1
> Environment: CentOS 7.8
> Tomcat webapp
> Reporter: Isabelle Giguere
> Priority: Major
>
> In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir". The MP4Parser creates files in java.io.tmpdir.
> The files created by the MP4Parser are never deleted from temp/. Ex: MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
> Oddly, there are no errors in logs. Nothing about files that cannot be deleted or not found.
> Other processes in our application needs to create other files in temp/, so we can't simply delete everything in that folder.
> I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion issues in the MP4Parser have been fixed. This may be a little gremlin in CentOS or in Tomcat ... ?
> I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to put the parser's temporary files in a folder that we can control, but to no avail. Tika's MP4Parser "parse" method initializes a new instance of TemporaryResources, so the TemporaryResources that I created is never used. The default TemporaryResources would use java.io.tmpdir anyways, right?
> So, why aren't these files deleted ?
> And, while we are on the subject, there should be a way to set a temporary files folder that parsers actually use (and the parser's dependencies). How can a user-defined TemporaryResources be useful if the parser ignores it ?
> Relevant code:
> {code}
> Parser parser = new AutoDetectParser(); // injected by Spring
> Path input = ...; // some mp4 audio file
> Path output = ...;
> final Metadata metadata = new Metadata();
> try(InputStream stream = TikaInputStream.get(input, metadata);
> OutputStream outputstream = new FileOutputStream(output.toFile());
> OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputstream, "UTF-8")){
> ParseContext parseContext = new ParseContext();
>
> parser.parse(stream, new BodyContentHandler(outputStreamWriter), metadata, parseContext);
>
> // do something with the metadata and the output
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)