You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by anita kulkarni <an...@yahoo.com> on 2013/10/23 21:42:17 UTC
Parsing a large PDF file using PDFBox
Hi
I need to put a header and footer to an existing PDF document. This works fine but when the PDF gets large (>18.9 MB) then I get the out of memory problem.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
at org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
at org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
at org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
at com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
at com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)
It is the same code as I had reported a bug in PDF Box. Here is the code for your ref:
// the document
PDDocument doc = null;
File file1 = new File(file);
try
{
//RandomAccessFile raf = new RandomAccessFile(file, "r");
//RandomAccess ra = new RandomAcsess
doc = PDDocument.loadNonSeq(file1, null);//load( file );
List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();
PDFont font = PDType1Font.HELVETICA;
float fontSize = 15.0f;
for( int i=0; i<allPages.size(); i++ )
//int i=0;
{
System.out.println("i = " + i);
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
float stringWidth = font.getStringWidth( message )*fontSize/1000f;
// calculate to center of the page :
int rotation = page.findRotation();
boolean rotate = rotation == 90 || rotation == 270;
float pageWidth = rotate ? pageSize.getHeight() : pageSize.getWidth();
float pageHeight = rotate ? pageSize.getWidth() : pageSize.getHeight();
double centeredXPosition = rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
double centeredYPosition = rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
// append the content to the existing stream
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
page.getResources().getFonts();
contentStream.beginText();
// set font and font size
contentStream.setFont( font, fontSize );
//set text color to red
//contentStream.setNonStrokingColor(255, 0, 0);
if (rotate)
{
// rotate the text according to the page rotation
contentStream.setTextRotation(Math.PI/2, centeredXPosition, 0);
}
else
{
contentStream.setTextTranslation(centeredXPosition, 0);
}
//contentStream.moveTextPositionByAmount(pageWidth/2 - 130, pageHeight-20);
contentStream.moveTextPositionByAmount(0, pageHeight-50);
contentStream.drawString( message);
contentStream.endText();
//Now repeat the same for footer
//page.getResources().getFonts();
contentStream.beginText();
// set font and font size
//contentStream.setFont( font, fontSize );
contentStream.moveTextPositionByAmount((int)centeredXPosition, 50);
contentStream.drawString( message);
contentStream.endText();
//Close contentStream.
contentStream.close();
}
doc.save( outfile );
}
Any suggestions you can provide is greatly appreciated.
-Anita
Re: Parsing a large PDF file using PDFBox
Posted by Fred Hansen <zw...@yahoo.com>.
If the 65,088 is from Runtime.getRuntime().maxMemory() then that is a very small amount of memory for running a Java program.
You can expand the memory available by setting a parameter to java. For instance -Xmx1024m gives you one gig.
Googling "java runtime memory settings" gave me a pointer to one way to set the parameter:
http://www.wikihow.com/Increase-Java-Memory-in-Windows-7
Other means are available if you are not using Windows-7
Fred Hansen
--------------------------------------------
On Thu, 10/24/13, anita kulkarni <an...@yahoo.com> wrote:
Subject: Re: Parsing a large PDF file using PDFBox
To: "dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
Date: Thursday, October 24, 2013, 10:41 AM
I have about 8GB and it is ~4GB in
general use, I don't see any spike when the big or small PDF
is run in Eclipse. I added the Runtime runtime =
Runtime.getRuntime(); to my app and I get:
free memory: 13
allocated memory: 65,088
max memory: 65,088
total free memory: 13
and just before it crashes:
Memory Stat = free memory: 0
allocated memory: 65,088
max memory: 65,088
total free memory: 0
Let me know if this makes any sense..
Thank you
-Anita
On Wednesday, October 23, 2013 8:06 PM, Fred Hansen <zw...@yahoo.com>
wrote:
How much memory has your computer got? How much space does
the
program occupy when processing a small pdf?
--------------------------------------------
On Wed, 10/23/13, anita kulkarni <an...@yahoo.com>
wrote:
Subject: Parsing a large PDF file using PDFBox
To: "dev@pdfbox.apache.org"
<de...@pdfbox.apache.org>
Date:
Wednesday, October 23, 2013, 3:42 PM
Hi
I need to put a header and footer to an existing PDF
document. This works fine but when the PDF gets large
(>18.9 MB) then I get the out of memory problem.
Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space
at
java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
at
org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
at
org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
at
org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)
It is the same code as I had reported a bug in PDF Box.
Here
is the code for your ref:
// the document
PDDocument doc = null;
File file1 = new File(file);
try
{
//RandomAccessFile raf = new
RandomAccessFile(file, "r");
//RandomAccess ra = new RandomAcsess
doc = PDDocument.loadNonSeq(file1,
null);//load( file );
List<PDPage> allPages =
doc.getDocumentCatalog().getAllPages();
PDFont font =
PDType1Font.HELVETICA;
float fontSize = 15.0f;
for( int i=0; i<allPages.size();
i++ )
//int i=0;
{
System.out.println("i = " +
i);
PDPage page =
(PDPage)allPages.get( i );
PDRectangle pageSize =
page.findMediaBox();
float stringWidth =
font.getStringWidth( message )*fontSize/1000f;
// calculate to center of
the
page :
int rotation =
page.findRotation();
boolean rotate = rotation ==
90 || rotation == 270;
float pageWidth = rotate ?
pageSize.getHeight() : pageSize.getWidth();
float pageHeight = rotate ?
pageSize.getWidth() : pageSize.getHeight();
double centeredXPosition =
rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
double centeredYPosition =
rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
// append the content to the
existing stream
PDPageContentStream
contentStream = new PDPageContentStream(doc, page, true,
true,true);
page.getResources().getFonts();
contentStream.beginText();
// set font and font size
contentStream.setFont( font,
fontSize );
//set text color to red
//contentStream.setNonStrokingColor(255, 0, 0);
if (rotate)
{
// rotate the text
according to the page rotation
contentStream.setTextRotation(Math.PI/2, centeredXPosition,
0);
}
else
{
contentStream.setTextTranslation(centeredXPosition, 0);
}
//contentStream.moveTextPositionByAmount(pageWidth/2 - 130,
pageHeight-20);
contentStream.moveTextPositionByAmount(0, pageHeight-50);
contentStream.drawString(
message);
contentStream.endText();
//Now repeat the same for
footer
//page.getResources().getFonts();
contentStream.beginText();
// set font and font size
//contentStream.setFont(
font, fontSize );
contentStream.moveTextPositionByAmount((int)centeredXPosition,
50);
contentStream.drawString(
message);
contentStream.endText();
//Close contentStream.
contentStream.close();
}
doc.save( outfile );
}
Any suggestions you can provide is greatly appreciated.
-Anita
Re: Parsing a large PDF file using PDFBox
Posted by anita kulkarni <an...@yahoo.com>.
I have about 8GB and it is ~4GB in general use, I don't see any spike when the big or small PDF is run in Eclipse. I added the Runtime runtime = Runtime.getRuntime(); to my app and I get:
free memory: 13
allocated memory: 65,088
max memory: 65,088
total free memory: 13
and just before it crashes:
Memory Stat = free memory: 0
allocated memory: 65,088
max memory: 65,088
total free memory: 0
Let me know if this makes any sense..
Thank you
-Anita
On Wednesday, October 23, 2013 8:06 PM, Fred Hansen <zw...@yahoo.com> wrote:
How much memory has your computer got? How much space does the
program occupy when processing a small pdf?
--------------------------------------------
On Wed, 10/23/13, anita kulkarni <an...@yahoo.com> wrote:
Subject: Parsing a large PDF file using PDFBox
To: "dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
Date:
Wednesday, October 23, 2013, 3:42 PM
Hi
I need to put a header and footer to an existing PDF
document. This works fine but when the PDF gets large
(>18.9 MB) then I get the out of memory problem.
Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space
at
java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
at
org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
at
org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
at
org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)
It is the same code as I had reported a bug in PDF Box. Here
is the code for your ref:
// the document
PDDocument doc = null;
File file1 = new File(file);
try
{
//RandomAccessFile raf = new
RandomAccessFile(file, "r");
//RandomAccess ra = new RandomAcsess
doc = PDDocument.loadNonSeq(file1,
null);//load( file );
List<PDPage> allPages =
doc.getDocumentCatalog().getAllPages();
PDFont font =
PDType1Font.HELVETICA;
float fontSize = 15.0f;
for( int i=0; i<allPages.size();
i++ )
//int i=0;
{
System.out.println("i = " +
i);
PDPage page =
(PDPage)allPages.get( i );
PDRectangle pageSize =
page.findMediaBox();
float stringWidth =
font.getStringWidth( message )*fontSize/1000f;
// calculate to center of the
page :
int rotation =
page.findRotation();
boolean rotate = rotation ==
90 || rotation == 270;
float pageWidth = rotate ?
pageSize.getHeight() : pageSize.getWidth();
float pageHeight = rotate ?
pageSize.getWidth() : pageSize.getHeight();
double centeredXPosition =
rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
double centeredYPosition =
rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
// append the content to the
existing stream
PDPageContentStream
contentStream = new PDPageContentStream(doc, page, true,
true,true);
page.getResources().getFonts();
contentStream.beginText();
// set font and font size
contentStream.setFont( font,
fontSize );
//set text color to red
//contentStream.setNonStrokingColor(255, 0, 0);
if (rotate)
{
// rotate the text
according to the page rotation
contentStream.setTextRotation(Math.PI/2, centeredXPosition,
0);
}
else
{
contentStream.setTextTranslation(centeredXPosition, 0);
}
//contentStream.moveTextPositionByAmount(pageWidth/2 - 130,
pageHeight-20);
contentStream.moveTextPositionByAmount(0, pageHeight-50);
contentStream.drawString(
message);
contentStream.endText();
//Now repeat the same for
footer
//page.getResources().getFonts();
contentStream.beginText();
// set font and font size
//contentStream.setFont(
font, fontSize );
contentStream.moveTextPositionByAmount((int)centeredXPosition,
50);
contentStream.drawString(
message);
contentStream.endText();
//Close contentStream.
contentStream.close();
}
doc.save( outfile );
}
Any suggestions you can provide is greatly appreciated.
-Anita
Re: Parsing a large PDF file using PDFBox
Posted by Fred Hansen <zw...@yahoo.com>.
How much memory has your computer got? How much space does the
program occupy when processing a small pdf?
--------------------------------------------
On Wed, 10/23/13, anita kulkarni <an...@yahoo.com> wrote:
Subject: Parsing a large PDF file using PDFBox
To: "dev@pdfbox.apache.org" <de...@pdfbox.apache.org>
Date: Wednesday, October 23, 2013, 3:42 PM
Hi
I need to put a header and footer to an existing PDF
document. This works fine but when the PDF gets large
(>18.9 MB) then I get the out of memory problem.
Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space
at
java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
at
org.apache.pdfbox.cos.COSStream.createUnfilteredStream(COSStream.java:459)
at
org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:218)
at
org.apache.pdfbox.pdmodel.edit.PDPageContentStream.<init>(PDPageContentStream.java:240)
at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.doIt(AddFOUOToReport.java:83)
at
com.akimeka.jmar.pdfwriter.AddFOUOToReport.main(AddFOUOToReport.java:137)
It is the same code as I had reported a bug in PDF Box. Here
is the code for your ref:
// the document
PDDocument doc = null;
File file1 = new File(file);
try
{
//RandomAccessFile raf = new
RandomAccessFile(file, "r");
//RandomAccess ra = new RandomAcsess
doc = PDDocument.loadNonSeq(file1,
null);//load( file );
List<PDPage> allPages =
doc.getDocumentCatalog().getAllPages();
PDFont font =
PDType1Font.HELVETICA;
float fontSize = 15.0f;
for( int i=0; i<allPages.size();
i++ )
//int i=0;
{
System.out.println("i = " +
i);
PDPage page =
(PDPage)allPages.get( i );
PDRectangle pageSize =
page.findMediaBox();
float stringWidth =
font.getStringWidth( message )*fontSize/1000f;
// calculate to center of the
page :
int rotation =
page.findRotation();
boolean rotate = rotation ==
90 || rotation == 270;
float pageWidth = rotate ?
pageSize.getHeight() : pageSize.getWidth();
float pageHeight = rotate ?
pageSize.getWidth() : pageSize.getHeight();
double centeredXPosition =
rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
double centeredYPosition =
rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
// append the content to the
existing stream
PDPageContentStream
contentStream = new PDPageContentStream(doc, page, true,
true,true);
page.getResources().getFonts();
contentStream.beginText();
// set font and font size
contentStream.setFont( font,
fontSize );
//set text color to red
//contentStream.setNonStrokingColor(255, 0, 0);
if (rotate)
{
// rotate the text
according to the page rotation
contentStream.setTextRotation(Math.PI/2, centeredXPosition,
0);
}
else
{
contentStream.setTextTranslation(centeredXPosition, 0);
}
//contentStream.moveTextPositionByAmount(pageWidth/2 - 130,
pageHeight-20);
contentStream.moveTextPositionByAmount(0, pageHeight-50);
contentStream.drawString(
message);
contentStream.endText();
//Now repeat the same for
footer
//page.getResources().getFonts();
contentStream.beginText();
// set font and font size
//contentStream.setFont(
font, fontSize );
contentStream.moveTextPositionByAmount((int)centeredXPosition,
50);
contentStream.drawString(
message);
contentStream.endText();
//Close contentStream.
contentStream.close();
}
doc.save( outfile );
}
Any suggestions you can provide is greatly appreciated.
-Anita