You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by anita kulkarni <an...@yahoo.com> on 2013/10/23 21:42:17 UTC

Parsing a large PDF file using PDFBox

Hi
    I need to put a header and footer to an existing PDF document. This works fine but when the PDF gets large (>18.9 MB) then I get the out of memory problem. 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
    at org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
    at org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
    at org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
    at com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
    at com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)

It is the same code as I had reported a bug in PDF Box. Here is the code for your ref:
// the document
        PDDocument doc = null;
        File file1 = new File(file); 
        try
        {
            //RandomAccessFile raf = new RandomAccessFile(file, "r");
            //RandomAccess ra = new RandomAcsess
            doc = PDDocument.loadNonSeq(file1, null);//load( file );

            List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();
            PDFont font = PDType1Font.HELVETICA;           
            float fontSize = 15.0f;   

            for( int i=0; i<allPages.size(); i++ )
            //int i=0;
            
            {
                System.out.println("i = " + i);
                PDPage page = (PDPage)allPages.get( i );
                PDRectangle pageSize = page.findMediaBox();
                float stringWidth = font.getStringWidth( message )*fontSize/1000f;
                // calculate to center of the page :
                int rotation = page.findRotation(); 
                boolean rotate = rotation == 90 || rotation == 270;
                float pageWidth = rotate ? pageSize.getHeight() : pageSize.getWidth();
                float pageHeight = rotate ? pageSize.getWidth() : pageSize.getHeight();
                double centeredXPosition = rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
                double centeredYPosition = rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
                
                // append the content to the existing stream
                PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
                page.getResources().getFonts();
                contentStream.beginText();
                // set font and font size
                contentStream.setFont( font, fontSize );
                //set text color to red
                //contentStream.setNonStrokingColor(255, 0, 0);
                if (rotate)
                {
                    // rotate the text according to the page rotation
                    contentStream.setTextRotation(Math.PI/2, centeredXPosition, 0);
                }
                else
                {
                    contentStream.setTextTranslation(centeredXPosition, 0);
                }
                //contentStream.moveTextPositionByAmount(pageWidth/2 - 130, pageHeight-20);
                contentStream.moveTextPositionByAmount(0, pageHeight-50);
                contentStream.drawString( message);
                contentStream.endText();
                //Now repeat the same for footer
                //page.getResources().getFonts();
                contentStream.beginText();
                // set font and font size
                //contentStream.setFont( font, fontSize );
                contentStream.moveTextPositionByAmount((int)centeredXPosition, 50);
                contentStream.drawString( message);
                contentStream.endText();
                //Close contentStream.
                contentStream.close();
                
            }

            doc.save( outfile );
        }

Any suggestions you can provide is greatly appreciated.

-Anita

Re: Parsing a large PDF file using PDFBox

Posted by Fred Hansen <zw...@yahoo.com>.
If the 65,088 is from Runtime.getRuntime().maxMemory() then that is a very small amount of memory for running a Java program.
You can expand the memory available by setting a parameter to java.  For instance -Xmx1024m gives you one gig. 
Googling  "java runtime memory settings" gave me a pointer to one way to set the parameter: 
      http://www.wikihow.com/Increase-Java-Memory-in-Windows-7
Other means are available if you are not using Windows-7

Fred Hansen

--------------------------------------------
On Thu, 10/24/13, anita kulkarni <an...@yahoo.com> wrote:

 Subject: Re: Parsing a large PDF file using PDFBox
 To: "dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
 Date: Thursday, October 24, 2013, 10:41 AM
 
 I have about 8GB and it is ~4GB in
 general use, I don't see any spike when the big or small PDF
 is run in Eclipse. I added the Runtime runtime =
 Runtime.getRuntime(); to my app and I get:
 free memory: 13
 allocated memory: 65,088
 max memory: 65,088
 total free memory: 13
 
 and just before it crashes:
 Memory Stat = free memory: 0
 allocated memory: 65,088
 max memory: 65,088
 total free memory: 0
 
     Let me know if this makes any sense..
 
 Thank you
 -Anita
 
 
 
 
 On Wednesday, October 23, 2013 8:06 PM, Fred Hansen <zw...@yahoo.com>
 wrote:
  
 How much memory has your computer got? How much space does
 the 
 program occupy when processing a small pdf?
 
 --------------------------------------------
 On Wed, 10/23/13, anita kulkarni <an...@yahoo.com>
 wrote:
 
 
 Subject: Parsing a large PDF file using PDFBox
 To: "dev@pdfbox.apache.org"
 <de...@pdfbox.apache.org>
 Date:
  Wednesday, October 23, 2013, 3:42 PM
 
 Hi
     I need to put a header and footer to an existing PDF
 document. This works fine but when the PDF gets large
 (>18.9 MB) then I get the out of memory problem. 
 
 Exception in thread "main" java.lang.OutOfMemoryError: Java
 heap space
     at
 java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
     at
 org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
     at
 org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
     at
 org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
     at
 com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
     at
 com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)
 
 It is the same code as I had reported a bug in PDF Box.
 Here
 is the code for your ref:
 // the document
         PDDocument doc = null;
         File file1 = new File(file); 
         try
         {
             //RandomAccessFile raf = new
 RandomAccessFile(file, "r");
             //RandomAccess ra = new RandomAcsess
             doc = PDDocument.loadNonSeq(file1,
 null);//load( file );
 
             List<PDPage> allPages =
 doc.getDocumentCatalog().getAllPages();
             PDFont font =
 PDType1Font.HELVETICA;           
             float fontSize = 15.0f;   
 
             for( int i=0; i<allPages.size();
 i++ )
             //int i=0;
             
             {
                 System.out.println("i = " +
 i);
                 PDPage page =
 (PDPage)allPages.get( i );
                 PDRectangle pageSize =
 page.findMediaBox();
                 float stringWidth =
 font.getStringWidth( message )*fontSize/1000f;
                 // calculate to center of
 the
 page :
                 int rotation =
 page.findRotation(); 
                 boolean rotate = rotation ==
 90 || rotation == 270;
                 float pageWidth = rotate ?
 pageSize.getHeight() : pageSize.getWidth();
                 float pageHeight = rotate ?
 pageSize.getWidth() : pageSize.getHeight();
                 double centeredXPosition =
 rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
                 double centeredYPosition =
 rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
                 
                 // append the content to the
 existing stream
                 PDPageContentStream
 contentStream = new PDPageContentStream(doc, page, true,
 true,true);
                
 page.getResources().getFonts();
                 contentStream.beginText();
                 // set font and font size
                 contentStream.setFont( font,
 fontSize );
                 //set text color to red
                
 //contentStream.setNonStrokingColor(255, 0, 0);
                 if (rotate)
                 {
                     // rotate the text
 according to the page rotation
                    
 contentStream.setTextRotation(Math.PI/2, centeredXPosition,
 0);
                 }
                 else
                 {
                    
 contentStream.setTextTranslation(centeredXPosition, 0);
                 }
                
 //contentStream.moveTextPositionByAmount(pageWidth/2 - 130,
 pageHeight-20);
                
 contentStream.moveTextPositionByAmount(0, pageHeight-50);
                 contentStream.drawString(
 message);
                 contentStream.endText();
                 //Now repeat the same for
 footer
                
 //page.getResources().getFonts();
                 contentStream.beginText();
                 // set font and font size
                 //contentStream.setFont(
 font, fontSize );
                
 contentStream.moveTextPositionByAmount((int)centeredXPosition,
 50);
                 contentStream.drawString(
 message);
                 contentStream.endText();
                 //Close contentStream.
                 contentStream.close();
                 
             }
 
             doc.save( outfile );
         }
 
 Any suggestions you can provide is greatly appreciated.
 
 -Anita

Re: Parsing a large PDF file using PDFBox

Posted by anita kulkarni <an...@yahoo.com>.
I have about 8GB and it is ~4GB in general use, I don't see any spike when the big or small PDF is run in Eclipse. I added the Runtime runtime = Runtime.getRuntime(); to my app and I get:
free memory: 13
allocated memory: 65,088
max memory: 65,088
total free memory: 13

and just before it crashes:
Memory Stat = free memory: 0
allocated memory: 65,088
max memory: 65,088
total free memory: 0

    Let me know if this makes any sense..

Thank you
-Anita




On Wednesday, October 23, 2013 8:06 PM, Fred Hansen <zw...@yahoo.com> wrote:
 
How much memory has your computer got? How much space does the 
program occupy when processing a small pdf?

--------------------------------------------
On Wed, 10/23/13, anita kulkarni <an...@yahoo.com> wrote:


Subject: Parsing a large PDF file using PDFBox
To: "dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
Date:
 Wednesday, October 23, 2013, 3:42 PM

Hi
    I need to put a header and footer to an existing PDF
document. This works fine but when the PDF gets large
(>18.9 MB) then I get the out of memory problem. 

Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space
    at
java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
    at
org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
    at
org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
    at
org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
    at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
    at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)

It is the same code as I had reported a bug in PDF Box. Here
is the code for your ref:
// the document
        PDDocument doc = null;
        File file1 = new File(file); 
        try
        {
            //RandomAccessFile raf = new
RandomAccessFile(file, "r");
            //RandomAccess ra = new RandomAcsess
            doc = PDDocument.loadNonSeq(file1,
null);//load( file );

            List<PDPage> allPages =
doc.getDocumentCatalog().getAllPages();
            PDFont font =
PDType1Font.HELVETICA;           
            float fontSize = 15.0f;   

            for( int i=0; i<allPages.size();
i++ )
            //int i=0;
            
            {
                System.out.println("i = " +
i);
                PDPage page =
(PDPage)allPages.get( i );
                PDRectangle pageSize =
page.findMediaBox();
                float stringWidth =
font.getStringWidth( message )*fontSize/1000f;
                // calculate to center of the
page :
                int rotation =
page.findRotation(); 
                boolean rotate = rotation ==
90 || rotation == 270;
                float pageWidth = rotate ?
pageSize.getHeight() : pageSize.getWidth();
                float pageHeight = rotate ?
pageSize.getWidth() : pageSize.getHeight();
                double centeredXPosition =
rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
                double centeredYPosition =
rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
                
                // append the content to the
existing stream
                PDPageContentStream
contentStream = new PDPageContentStream(doc, page, true,
true,true);
               
page.getResources().getFonts();
                contentStream.beginText();
                // set font and font size
                contentStream.setFont( font,
fontSize );
                //set text color to red
               
//contentStream.setNonStrokingColor(255, 0, 0);
                if (rotate)
                {
                    // rotate the text
according to the page rotation
                   
contentStream.setTextRotation(Math.PI/2, centeredXPosition,
0);
                }
                else
                {
                   
contentStream.setTextTranslation(centeredXPosition, 0);
                }
               
//contentStream.moveTextPositionByAmount(pageWidth/2 - 130,
pageHeight-20);
               
contentStream.moveTextPositionByAmount(0, pageHeight-50);
                contentStream.drawString(
message);
                contentStream.endText();
                //Now repeat the same for
footer
               
//page.getResources().getFonts();
                contentStream.beginText();
                // set font and font size
                //contentStream.setFont(
font, fontSize );
               
contentStream.moveTextPositionByAmount((int)centeredXPosition,
50);
                contentStream.drawString(
message);
                contentStream.endText();
                //Close contentStream.
                contentStream.close();
                
            }

            doc.save( outfile );
        }

Any suggestions you can provide is greatly appreciated.

-Anita

Re: Parsing a large PDF file using PDFBox

Posted by Fred Hansen <zw...@yahoo.com>.
How much memory has your computer got? How much space does the 
program occupy when processing a small pdf?

--------------------------------------------
On Wed, 10/23/13, anita kulkarni <an...@yahoo.com> wrote:

 Subject: Parsing a large PDF file using PDFBox
 To: "dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
 Date: Wednesday, October 23, 2013, 3:42 PM
 
 Hi
     I need to put a header and footer to an existing PDF
 document. This works fine but when the PDF gets large
 (>18.9 MB) then I get the out of memory problem. 
 
 Exception in thread "main" java.lang.OutOfMemoryError: Java
 heap space
     at
 java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
     at
 org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
     at
 org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
     at
 org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
     at
 com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
     at
 com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)
 
 It is the same code as I had reported a bug in PDF Box. Here
 is the code for your ref:
 // the document
         PDDocument doc = null;
         File file1 = new File(file); 
         try
         {
             //RandomAccessFile raf = new
 RandomAccessFile(file, "r");
             //RandomAccess ra = new RandomAcsess
             doc = PDDocument.loadNonSeq(file1,
 null);//load( file );
 
             List<PDPage> allPages =
 doc.getDocumentCatalog().getAllPages();
             PDFont font =
 PDType1Font.HELVETICA;           
             float fontSize = 15.0f;   
 
             for( int i=0; i<allPages.size();
 i++ )
             //int i=0;
             
             {
                 System.out.println("i = " +
 i);
                 PDPage page =
 (PDPage)allPages.get( i );
                 PDRectangle pageSize =
 page.findMediaBox();
                 float stringWidth =
 font.getStringWidth( message )*fontSize/1000f;
                 // calculate to center of the
 page :
                 int rotation =
 page.findRotation(); 
                 boolean rotate = rotation ==
 90 || rotation == 270;
                 float pageWidth = rotate ?
 pageSize.getHeight() : pageSize.getWidth();
                 float pageHeight = rotate ?
 pageSize.getWidth() : pageSize.getHeight();
                 double centeredXPosition =
 rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
                 double centeredYPosition =
 rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
                 
                 // append the content to the
 existing stream
                 PDPageContentStream
 contentStream = new PDPageContentStream(doc, page, true,
 true,true);
                
 page.getResources().getFonts();
                 contentStream.beginText();
                 // set font and font size
                 contentStream.setFont( font,
 fontSize );
                 //set text color to red
                
 //contentStream.setNonStrokingColor(255, 0, 0);
                 if (rotate)
                 {
                     // rotate the text
 according to the page rotation
                    
 contentStream.setTextRotation(Math.PI/2, centeredXPosition,
 0);
                 }
                 else
                 {
                    
 contentStream.setTextTranslation(centeredXPosition, 0);
                 }
                
 //contentStream.moveTextPositionByAmount(pageWidth/2 - 130,
 pageHeight-20);
                
 contentStream.moveTextPositionByAmount(0, pageHeight-50);
                 contentStream.drawString(
 message);
                 contentStream.endText();
                 //Now repeat the same for
 footer
                
 //page.getResources().getFonts();
                 contentStream.beginText();
                 // set font and font size
                 //contentStream.setFont(
 font, fontSize );
                
 contentStream.moveTextPositionByAmount((int)centeredXPosition,
 50);
                 contentStream.drawString(
 message);
                 contentStream.endText();
                 //Close contentStream.
                 contentStream.close();
                 
             }
 
             doc.save( outfile );
         }
 
 Any suggestions you can provide is greatly appreciated.
 
 -Anita