You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Alexios Giotis <al...@gmail.com> on 2013/02/07 18:36:15 UTC

Trouble with temporary files after the merge of Temp_URI_Resolution branch

Hi,

In certain cases FOP needs to write temporary files. For example org.apache.fop.afp.AFPStreamer needs to concatenate the AFP resources with the main document. After the vote and merge of the Temp_URI_Resolution branch (Sept 2012), the actual pattern of using temp files has changed from:

File tmpFile = File.createTempFile(....);
// Write and read from the file
tmpFile.delete();

to:
File tmpFile = new File(System.getProperty("java.io.tmpdir"), counterStartingFrom1AsString);
tmpFile.deleteOnExit();
// Write and read from the file


This introduces  a number of bad side effects:
1. Different FOP processes can't be executed in parallel on the same system because creating the same temp file fails.

2. If the JVM is not normally terminated, the temp files are never deleted and the next invocation of the JVM fails to run.

3. deleteOnExit() keeps for the life of the JVM an unknown number of temp files both on disk and a reference in memory.


I am volunteering to prepare a patch for both XGC and FOP to fix those issues. I was thinking of adding a cleanup() method on org.apache.xmlgraphics.io.TempResourceResolver interface so it gets a notification to delete the temp files, moving there the isTempURI() and generate() methods of TempResourceURIGenerator class and then deleting it.

I am also tempted to delete the org.apache.xmlgraphics.io.Resource and change in ResourceResolver the 
Resource getResource(URI uri)
to 
InputStream getInputStream(URI uri)


Although in [1] there is a reason for having it, in practice the "type" is used nowhere. 


WDYT ?

Alexis Giotis




[1] http://wiki.apache.org/xmlgraphics-fop/URIResolution

Re: Trouble with temporary files after the merge of Temp_URI_Resolution branch

Posted by Alexios Giotis <al...@gmail.com>.
Hi Peter,

Thanks for answering. I added inline my replies. As a next step, I will prepare a patch (hopefully in the next few days) so that we may comment on something more concrete.


On 8 Feb 2013, at 16:07, Peter Hancock <pe...@gmail.com> wrote:

> Hi Alexis,
> 
> The DefaultTempResourceResolver is used by the DefaultResourceResolver
> and is suitable for the case when we run FOP from the command line
> 
> 
>> ... the actual pattern of using temp files has changed from:
>> 
>> File tmpFile = File.createTempFile(....);
>> // Write and read from the file
>> tmpFile.delete();
>> 
>> to:
>> File tmpFile = new File(System.getProperty("java.io.tmpdir"), counterStartingFrom1AsString);
>> tmpFile.deleteOnExit();
>> // Write and read from the file
>> 
>> 
>> This introduces  a number of bad side effects:
>> 1. Different FOP processes can't be executed in parallel on the same system because creating the same temp file fails.
>> 
>> 2. If the JVM is not normally terminated, the temp files are never deleted and the next invocation of the JVM fails to run.
>> 
>> 3. deleteOnExit() keeps for the life of the JVM an unknown number of temp files both on disk and a reference in memory.
>> 
> The DefaultTempResourceResolver is used by the DefaultResourceResolver
> and is suitable in the case when we run FOP from the command line and
> sequentiality at the system level.  I agree that the first side effect
> is problematic and the URI generator should perhaps add some random
> characters or something similiar.  The 2nd point and 3rd point are not
> an issue when we run FOP from the command line.
> 
> When running FOP as part of a long running process I would recommend
> creating an implementation of TempResourceResolver that handles the
> cleanup:  This implementation could return an instance of Resource in
> the getResource method that s parametrised with an InputStream that
> uses close() to perform the file deletion - you could even decide to
> store the temporary data in memory if your requirements make that
> feasible.


I guess that most people execute FOP from the command line and for them those points can be indeed fixed by adding a random part on the filenames. But how can the rest of the FOP users accept that starting from FOP v1.2? (or 2.0?) , they should implement a resolver in order to have the decent handling that FOP used to have ?

The semantics are also not clear. Why should a resource be deleted when an input stream is closed ? What if the implementation needs to read a tmp file (or part of it) twice ? 

Temp files differ a lot from resources like fonts. In my opinion, even in a cloud environment, they should be written on the local disk. Why would anyone need to transfer the temp files of FOP over the network ? The access to disk is never lost, it's independent of the network traffic, the OS may cache them in memory and FOP may have random access to them directly or by memory mapping them.
 


> 
>> I am also tempted to delete the org.apache.xmlgraphics.io.Resource and change in ResourceResolver the
>> Resource getResource(URI uri)
>> to
>> InputStream getInputStream(URI uri)
>> 
>> 
>> Although in [1] there is a reason for having it, in practice the "type" is used nowhere.
>> 
> Although the type property is not used yet, it was introduced as a way
> of being able to set something like a mime type on a resource without
> having to encode that in the URL.  For example, if FOP needed to be
> able to determine the type of font that a resource represented it
> would be better to declare this as a property, rather than perhaps
> using a pseudo file extension.  This was looslely inspired by the HTTP
> protocol that uses the 'content-type header field to identify the
> content type of a resonse.
> 
> Please ask me to expand on any of my explanations if you require more detail.
> 
> Thanks,
> 
> Pete


Re: Trouble with temporary files after the merge of Temp_URI_Resolution branch

Posted by Peter Hancock <pe...@gmail.com>.
Hi Alexis,

The DefaultTempResourceResolver is used by the DefaultResourceResolver
and is suitable for the case when we run FOP from the command line


>... the actual pattern of using temp files has changed from:
>
> File tmpFile = File.createTempFile(....);
> // Write and read from the file
> tmpFile.delete();
>
> to:
> File tmpFile = new File(System.getProperty("java.io.tmpdir"), counterStartingFrom1AsString);
> tmpFile.deleteOnExit();
> // Write and read from the file
>
>
> This introduces  a number of bad side effects:
> 1. Different FOP processes can't be executed in parallel on the same system because creating the same temp file fails.
>
> 2. If the JVM is not normally terminated, the temp files are never deleted and the next invocation of the JVM fails to run.
>
> 3. deleteOnExit() keeps for the life of the JVM an unknown number of temp files both on disk and a reference in memory.
>
The DefaultTempResourceResolver is used by the DefaultResourceResolver
and is suitable in the case when we run FOP from the command line and
sequentiality at the system level.  I agree that the first side effect
is problematic and the URI generator should perhaps add some random
characters or something similiar.  The 2nd point and 3rd point are not
an issue when we run FOP from the command line.

When running FOP as part of a long running process I would recommend
creating an implementation of TempResourceResolver that handles the
cleanup:  This implementation could return an instance of Resource in
the getResource method that s parametrised with an InputStream that
uses close() to perform the file deletion - you could even decide to
store the temporary data in memory if your requirements make that
feasible.

> I am also tempted to delete the org.apache.xmlgraphics.io.Resource and change in ResourceResolver the
> Resource getResource(URI uri)
> to
> InputStream getInputStream(URI uri)
>
>
> Although in [1] there is a reason for having it, in practice the "type" is used nowhere.
>
Although the type property is not used yet, it was introduced as a way
of being able to set something like a mime type on a resource without
having to encode that in the URL.  For example, if FOP needed to be
able to determine the type of font that a resource represented it
would be better to declare this as a property, rather than perhaps
using a pseudo file extension.  This was looslely inspired by the HTTP
protocol that uses the 'content-type header field to identify the
content type of a resonse.

Please ask me to expand on any of my explanations if you require more detail.

Thanks,

Pete