You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Keenan Ross <ke...@broad.mit.edu> on 2006/12/13 00:36:34 UTC
FileUpload runs out of memory unnecessairly when FileCleaner Can't
Keep up
Commons FileUpload uses the FileCleaner class from Commons IO to ensure
that all temporary files are deleted when the corresponding Java object
goes out of scope. FileCleaner works well in most situations, but can
lead to OutOfMemoryExceptions in long running active programs. For
example, in the situation where we have a web server with 50 threads
uploading files, the low priority single cleaning thread doesn't get
enough CPU time to empty its queue. Thus the queue holding the list of
files to be deleted grows until memory is exhausted. This is even more
ironic in our environment because we are careful to delete all the
temporary files, so the FileCleaner never finds anything that needs
deleting anyway on its long queue.
We have seen up to 300,000 objects on the FileCleaner queue, each of
which references a large amount of memory, including a couple of IO
buffers, keeping it from being garbage collected. One possible solution
is to minimize the size of the marker object put on the ReferenceQueue,
perhaps just the file name instead of the whole DiskFileItem, but that
only postpones the problem since the queue may still not get enough time
to empty.
Another potential solution is to remove the file from the FileCleaner
queue when it is explicitly deleted, but I don't see any mechanism in
the underlying ReferenceQueue to remove arbitrary items.
So the approach we took was to subclass DiskFileItem with an alternative
implementation that 1) does not enqueue the DiskFileItem with the
FileCleaner and 2) has a empty finalize method (to speed up garbage
collection). We called that class UnmanagedDeleteDiskFileItem to
indicate that the user must explicitly clean up the temporary file
(typically with finally blocks). We also coded an alternative
DiskFileItemFactory whose constructor took a boolean indicating whether
the returned items were to be implicitly or explicitly managed, choosing
either the old or new DiskFileItem implementation respectively.
I'd like to ask the FileUpload maintainers to consider these options:
1) Adopt a strategy like the one we use to allow the coder to ask the
factory for either managed or unmanaged DiskFileItems, both of which are
supplied by the FileUpload package. I have attached our
DiskFileItemFactory and UnmanagedDeleteDiskFileItem to use as a starting
point of this enhancement.
2) Make DiskFileItem more supportive of subclassing, by adding a
protected getTempFileName method which the subclass can call. This would
allow users to cleanly subclass DiskFileItem. Currently, our
UnmanagedDeleteDiskFileItem implementation has to duplicate some of the
code from DiskFileItem because too much was marked private.
--keenan