You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Keenan Ross <ke...@broad.mit.edu> on 2006/12/13 00:36:34 UTC

FileUpload runs out of memory unnecessairly when FileCleaner Can't Keep up

Commons FileUpload uses the FileCleaner class from Commons IO to ensure 
that all temporary files are deleted when the corresponding Java object 
goes out of scope. FileCleaner works well in most situations, but can 
lead to OutOfMemoryExceptions in long running active programs. For 
example, in the situation where we have a web server with 50 threads 
uploading files, the low priority single cleaning thread doesn't get 
enough CPU time to empty its queue. Thus the queue holding the list of 
files to be deleted grows until memory is exhausted. This is even more 
ironic in our environment because we are careful to delete all the 
temporary files, so the FileCleaner never finds anything that needs 
deleting anyway on its long queue.

We have seen up to 300,000 objects on the FileCleaner queue, each of 
which references a large amount of memory, including a couple of IO 
buffers, keeping it from being garbage collected. One possible solution 
is to minimize the size of the marker object put on the ReferenceQueue, 
perhaps just the file name instead of the whole DiskFileItem, but that 
only postpones the problem since the queue may still not get enough time 
to empty.

Another potential solution is to remove the file from the FileCleaner 
queue when it is explicitly deleted, but I don't see any mechanism in 
the underlying ReferenceQueue to remove arbitrary items.

So the approach we took was to subclass DiskFileItem with an alternative 
implementation that 1) does not enqueue the DiskFileItem with the 
FileCleaner and 2) has a empty finalize method (to speed up garbage 
collection). We called that class UnmanagedDeleteDiskFileItem to 
indicate that the user must explicitly clean up the temporary file 
(typically with finally blocks). We also coded an alternative 
DiskFileItemFactory whose constructor took a boolean indicating whether 
the returned items were to be implicitly or explicitly managed, choosing 
either the old or new DiskFileItem implementation respectively.

I'd like to ask the FileUpload maintainers to consider these options:
1) Adopt a strategy like the one we use to allow the coder to ask the 
factory for either managed or unmanaged DiskFileItems, both of which are 
supplied by the FileUpload package. I have attached our 
DiskFileItemFactory and UnmanagedDeleteDiskFileItem to use as a starting 
point of this enhancement.
2) Make DiskFileItem more supportive of subclassing, by adding a 
protected getTempFileName method which the subclass can call. This would 
allow users to cleanly subclass DiskFileItem. Currently, our 
UnmanagedDeleteDiskFileItem implementation has to duplicate some of the 
code from DiskFileItem because too much was marked private.
--keenan