You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by John Lussmyer <Co...@CasaDelGato.com> on 2022/03/17 18:15:39 UTC

Possible PDFBox bug?

We have an app that can generate multi-page PDF Files.  We recently ran into a problem where the library we were using would keep ALL the pages in  memory.  For a quick workaround we have it write out single-page PDF files, then use PDFBox to combine them.

We recently found a bug in the way that the pages get modified when combined into a single PDF.
When we generate the pages, sometimes the MediaBox starts at negative coordinates.  When PDFBox adds that page to a document, it offsets it by that negative amount - which moves the page content up and to the right.

Out page combining code looks like this.

		try (PDDocument doc = new PDDocument(MemoryUsageSetting.setupTempFileOnly())) {
			for (File pagFile : srcPages) {
				log.debug("make: page {}", pagFile.getAbsolutePath());
				PDPage page = new PDPage();
				doc.addPage(page);

				try (PDPageContentStream contents = new PDPageContentStream(doc, page)) {

					try (PDDocument sourceDoc = Loader.loadPDF(pagFile, MemoryUsageSetting.setupTempFileOnly())) {
						PDPage srcPage = sourceDoc.getPage(0);
						page.setUserUnit(srcPage.getUserUnit());
						page.setMediaBox(srcPage.getMediaBox());
						page.setCropBox(srcPage.getCropBox());
						page.setTrimBox(srcPage.getTrimBox());

						// Create a Form XObject from the source document using LayerUtility
						LayerUtility layerUtility = new LayerUtility(doc);
						PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, 0);
						// draw the full form
						contents.drawForm(form);
					}
				}
			}

			doc.save(outPDF);
		}

The original Page pdf has a TrimBox[0,0,1296,864], MediaBox[-72,-72,1368,936]
The page in the PDFBox combined output has the same TrimBox and MediaBox, BUT the /Form1 it uses to place the contents has a BBox[-72,-72,1368,936] and a Matrix[1,0,0,1,72,72].
I'm not sure why it's adding a Matrix to offset the content.

AW: Possible PDFBox bug?

Posted by "Hiller, Gerhard" <Ge...@msh.de>.
Hi John,

try srcPage.getMediaBox().createRetranslatedRectangle(), also for the other boxes.

The returned rectangle from srcPage.getMediaBox() will reflect the negativ coordinates.

Greetings
Gerhard


Mit freundlichen Grüßen
Gerhard Hiller
mailto:gerhard.hiller@msh.de | Phone: +49 711 72007 4163 | Mobile: +49 172 718 48 46

Printproduktion neu gedacht
- -

MSH Medien System Haus GmbH & Co. KG, Stuttgart, HRA 9274 Stuttgart
P.h.G.: MSH Medien System Haus Verwaltungsges. MbH, Stuttgart, HRB 4443 Stuttgart

-----Ursprüngliche Nachricht-----
Von: John Lussmyer <Co...@CasaDelGato.com> 
Gesendet: Donnerstag, 17. März 2022 19:16
An: users@pdfbox.apache.org
Betreff: Possible PDFBox bug?

We have an app that can generate multi-page PDF Files.  We recently ran into a problem where the library we were using would keep ALL the pages in  memory.  For a quick workaround we have it write out single-page PDF files, then use PDFBox to combine them.

We recently found a bug in the way that the pages get modified when combined into a single PDF.
When we generate the pages, sometimes the MediaBox starts at negative coordinates.  When PDFBox adds that page to a document, it offsets it by that negative amount - which moves the page content up and to the right.

Out page combining code looks like this.

		try (PDDocument doc = new PDDocument(MemoryUsageSetting.setupTempFileOnly())) {
			for (File pagFile : srcPages) {
				log.debug("make: page {}", pagFile.getAbsolutePath());
				PDPage page = new PDPage();
				doc.addPage(page);

				try (PDPageContentStream contents = new PDPageContentStream(doc, page)) {

					try (PDDocument sourceDoc = Loader.loadPDF(pagFile, MemoryUsageSetting.setupTempFileOnly())) {
						PDPage srcPage = sourceDoc.getPage(0);
						page.setUserUnit(srcPage.getUserUnit());
						page.setMediaBox(srcPage.getMediaBox());
						page.setCropBox(srcPage.getCropBox());
						page.setTrimBox(srcPage.getTrimBox());

						// Create a Form XObject from the source document using LayerUtility
						LayerUtility layerUtility = new LayerUtility(doc);
						PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, 0);
						// draw the full form
						contents.drawForm(form);
					}
				}
			}

			doc.save(outPDF);
		}

The original Page pdf has a TrimBox[0,0,1296,864], MediaBox[-72,-72,1368,936] The page in the PDFBox combined output has the same TrimBox and MediaBox, BUT the /Form1 it uses to place the contents has a BBox[-72,-72,1368,936] and a Matrix[1,0,0,1,72,72].
I'm not sure why it's adding a Matrix to offset the content.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org