You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Pascal Essiembre (JIRA)" <ji...@apache.org> on 2015/03/22 05:46:10 UTC

[jira] [Created] (PDFBOX-2723) PDFBox*.tmp files not deleted by COSParser

Pascal Essiembre created PDFBOX-2723:
----------------------------------------

             Summary: PDFBox*.tmp files not deleted by COSParser 
                 Key: PDFBOX-2723
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2723
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 2.0.0
         Environment: Windows and Linux, with issue being critical on Linux
            Reporter: Pascal Essiembre
             Fix For: 2.0.0


When parsing PDFs, temporary files get created under the system temp directory (e.g. PDFBox6525369863339991063.tmp).  All files created for each documents are always deleted except for one.   So each document parsed adds a new tmp file that never gets deleted.  That's likely due to a stream never closed.  When processing many PDFs on Linux in the same JVM instance, we get the crashing error: "Too many files open".  Changing the max file handle on the OS is not always an option.

I was able to fix this by modifying the {{COSParser}} class to close a {{COSStream}} instance:

{code:title=COSParser.java, starting on line 312|borderStyle=solid}
    private long parseXrefObjStream(long objByteOffset, boolean isStandalone) throws IOException
    {
        // ---- parse indirect object head
        readObjectNumber();
        readGenerationNumber();
        readExpectedString(OBJ_MARKER, true);

        COSDictionary dict = parseCOSDictionary();
        COSStream xrefStream = parseCOSStream(dict);
        parseXrefStream(xrefStream, (int) objByteOffset, isStandalone);
        xrefStream.close();  // <--- *** NEW LINE ***
        return dict.getLong(COSName.PREV);
    }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org