You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by GitBox <gi...@apache.org> on 2021/09/27 18:24:48 UTC

[GitHub] [pdfbox] ds3v opened a new pull request #131: Reduce memory usage

ds3v opened a new pull request #131:
URL: https://github.com/apache/pdfbox/pull/131


   We are trying to use the PDFMergerUtility to merge a huge number of PDF files(up to 50000 pages in summary), but are having an issue with large heap. We have tried using setupTempFileOnly() but that doesn't seem to help.  
   We analyze heap dumps and found that main part of a heap is a current page-buffers(arrays of 4096 bytes), that are referenced from ScratchFileBuffer. 
   The idea of this fix is to remove reference to page-buffer from ScratchFileBuffer when COSStream completely processed.
   
   Sample code:
   ```java
   public class PdfBoxLargePdf 
   {
       public static void main(String[] args) throws Exception
       {
           int fileCount = 2000;
           List<Closeable> toBeClosed = new ArrayList<Closeable>(fileCount);
           try 
               {
               PDFMergerUtility utility = new PDFMergerUtility();
               for (int i = 0; i < fileCount; i++) 
               {
                   FileInputStream fis = new FileInputStream(new File("~/exchange/source.pdf"));
                   toBeClosed.add(fis);
                   utility.addSource(fis);
               }
               utility.setDestinationFileName("target/combined.pdf");
               utility.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
           } 
           finally 
           {
               for (Closeable closeable : toBeClosed) 
               {
                   IOUtils.closeQuietly(closeable);
               }
           }
       }
   }
   ```
   Sample uses pdfbox from 2.0 branch and runs with VM options “-Xmx1G -XX:+HeapDumpOnOutOfMemoryError”.
   
   In the branch 2.0 this code gets an OutOfMemoryError. Processed only 1058 source documents before OOM:
   ![image](https://user-images.githubusercontent.com/25397526/134963758-708437fc-0989-433a-b0d2-e507129c627b.png)
   ![image](https://user-images.githubusercontent.com/25397526/134963783-5c16ce04-7ead-476d-acef-a295a849901a.png)
    
   After fix this sample completed successfully. Processed all 2000 source files:
   ![image](https://user-images.githubusercontent.com/25397526/134963823-0d26d3bb-762f-4ef6-96f2-09361015f806.png)
   ![image](https://user-images.githubusercontent.com/25397526/134963848-1edab947-7102-4d99-8174-af41f71d8a24.png)
   
    
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org