You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2022/07/31 12:41:00 UTC

[jira] [Commented] (PDFBOX-5462) OutOfMemoryError when watermaking in 3.0.0-RC1

    [ https://issues.apache.org/jira/browse/PDFBOX-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573416#comment-17573416 ] 

Andreas Lehmkühler commented on PDFBOX-5462:
--------------------------------------------

To sum it up, 2.0.x did some caching under the hood which was removed in 3.0.x. The 3.0.x code base was refactored many aspects so that the memory foot print was optimized. However, this might still lead to an OOM-exception in some corner cases if the 2.0.x code isn't adjusted to those changes of behaviour when porting it to 3.0.x. 

The good news is, in many cases one is able to "simulate" that 2.0.x behaviour when using an InputStream.

* {{MemoryUsageSetting.setupMainMemoryOnly()}} -> use {{org.apache.pdfbox.io.RandomAccessReadBuffer}}, it copies the whole InputStream to the memory. This works fine for small files 
* {{MemoryUsageSetting.setupTempFileOnly()()}} -> copy the InputStream to a (temp-) file and use {{org.apache.pdfbox.io.RandomAccessReadBufferedFile}} or {{org.apache.pdfbox.io.RandomAccessReadMemoryMappedFile}}. This is the right choice for bigger files and/or environments with limited resources 

Only the third case {{org.apache.pdfbox.io.MemoryUsageSetting.setupMixed()}} the mixed usage of both isn't supported any more. You have to decide which one to use or have to provide your own cache by implementing a class using the interface {{org.apache.pdfbox.io.RandomAccessRead}}.

> OutOfMemoryError when watermaking in 3.0.0-RC1
> ----------------------------------------------
>
>                 Key: PDFBOX-5462
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5462
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Marian Ion
>            Priority: Major
>         Attachments: TestPdfBox.tgz, my-pdf-test.jar
>
>
> I am using the Maven *3.0.0-RC1* version and I encounter the following error when watermarking a 5120 pages file:
> {quote} java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects
> {quote}
>  
> However, the *2.0.26* version code works without problem!
> The code is basically this :
> {code:java}
> private static final PDFont PDF_FONT = PDType1Font.HELVETICA;
> memoryUsageSetting = MemoryUsageSetting.setupMixed(2 * ONE_GIGA, 40 * ONE_GIGA);
> //try (PDDocument pdfDocument = PDDocument.load(is, memoryUsageSetting)) {  // 2.0.26
> try (PDDocument pdfDocument = Loader.loadPDF(inputStream, memoryUsageSetting)) { // 3.0.0-RC1
> 	int nbPages = addWatermark(watermarkText, pdfDocument);
> 	pdfDocument.save(os);
> }
> ...
> private int addWatermark(String watermarkText, PDDocument document) throws IOException {
> 	int numberOfPages = document.getNumberOfPages();
> 	System.out.printf("Start adding watermark on a %d pages PDF document%n", numberOfPages);
> 	long start = System.nanoTime();
> 	int pageIndex = 0;
> 	for(PDPage page : document.getPages()) {
> 		++pageIndex;
> 		try (PDPageContentStream cs = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true, true)) {
> 			float width = page.getMediaBox().getWidth();
> 			float height = page.getMediaBox().getHeight();
> 			int rotation = page.getRotation();
> 			switch(rotation) {
> 				case 90:
> 					width = page.getMediaBox().getHeight();
> 					height = page.getMediaBox().getWidth();
> 					cs.transform(Matrix.getRotateInstance(Math.toRadians(90), height, 0));
> 					break;
> 				case 180:
> 					cs.transform(Matrix.getRotateInstance(Math.toRadians(180), width, height));
> 					break;
> 				case 270:
> 					width = page.getMediaBox().getHeight();
> 					height = page.getMediaBox().getWidth();
> 					cs.transform(Matrix.getRotateInstance(Math.toRadians(270), 0, width));
> 					break;
> 				default:
> 					break;
> 		}
> 		double stringWidth = (double)PDF_FONT.getStringWidth(watermarkText) / 1000 * FONT_HEIGHT;
> 		double diagonalLength = Math.sqrt((double)width * width + (double)height * height);
> 		double angle = Math.atan2(height, width);
> 		cs.transform(Matrix.getRotateInstance(angle, 0, 0));
> 		cs.setFont(PDF_FONT, (float)FONT_HEIGHT);
> 		//cs.setRenderingMode(RenderingMode.STROKE); // for "hollow" effect
> 		PDExtendedGraphicsState gs = new PDExtendedGraphicsState();
> 		gs.setNonStrokingAlphaConstant(0.2f);
> 		gs.setStrokingAlphaConstant(0.2f);
> 		gs.setBlendMode(BlendMode.MULTIPLY);
> 		cs.setGraphicsStateParameters(gs);
> 		// some API weirdness here. When int, range is 0..255.
> 		// when float, this would be 0..1f
> 		cs.setNonStrokingColor(0f, 0, 0);
> 		cs.setStrokingColor(0f, 0, 0); // black
> 		float x = (float)((diagonalLength - stringWidth) / 2); // "horizontal" position in rotated world
> 		float y = (float)(-FONT_HEIGHT / 4); // 4 is a trial-and-error thing, this lowers the text a bit
> 		cs.beginText();
> 		cs.newLineAtOffset(x, y);
> 		cs.showText(watermarkText);
> 		cs.endText();
> 	} finally {
> 				...
> 	}
> 	return numberOfPages;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org