You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/02/05 05:22:00 UTC

[jira] [Closed] (PDFBOX-5098) rendering slow for some pages. Is there any way we can easily/roughly estimate how long it needs to convert PDF to image for a document/page? Anyway to make it faster?

     [ https://issues.apache.org/jira/browse/PDFBOX-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr closed PDFBOX-5098.
-----------------------------------
    Resolution: Invalid

Please don't use the bug tracker for this.
https://pdfbox.apache.org/support.html

(And no, there is no way to estimate it)

> rendering slow for some pages. Is there any way we can easily/roughly estimate how long it needs to convert PDF to image for a document/page? Anyway to make it faster?
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5098
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5098
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Yuguang Huang
>            Priority: Major
>
> We are converting PDF files into images and the way we are doing it is breaking a single PDF files into several PDDocument, one per page, and converting them in parallel. 
>  
> What I found is for pages with more objects, the processing is going to take much longer (see below logs). 
> I cannot share the test file for now. I will need to ask for permission. 
> Is there way to estimate the duration? And it there way to make it faster? Also I see the below logs for pages requiring longer processing time. 
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint getAnchorRect
> INFO: Pattern surface is too large, will be clipped
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint getAnchorRect
> INFO: width: 4405.8223, height: -4405.8223
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint getAnchorRect
> INFO: XStep: 1707.63, YStep: 1707.63
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint getAnchorRect
> INFO: bbox: [-54.8253,-217.611,1652.8,1490.02]
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint getAnchorRect
> INFO: pattern matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint getAnchorRect
> INFO: concatenated matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
>  
> [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
> [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
> [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
> [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
> [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
> [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
> [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
> [ForkJoinPool.commonPool-worker-10] INFO doc.Pdf2Image - Page 3 takes 0.803.
> [ForkJoinPool.commonPool-worker-13] INFO doc.Pdf2Image - Page 8 takes 0.805.
> [ForkJoinPool.commonPool-worker-8] INFO doc.Pdf2Image - Page 4 takes 0.822.
> [ForkJoinPool.commonPool-worker-15] INFO doc.Pdf2Image - Page 0 takes 0.852.
> [ForkJoinPool.commonPool-worker-11] INFO doc.Pdf2Image - Page 5 takes 0.892.
> [ForkJoinPool.commonPool-worker-4] INFO doc.Pdf2Image - Page 1 takes 0.901.
> [ForkJoinPool.commonPool-worker-6] INFO doc.Pdf2Image - Page 7 takes 0.962.
> [ForkJoinPool.commonPool-worker-2] INFO doc.Pdf2Image - Page 9 takes 1.075.
> [ForkJoinPool.commonPool-worker-1] INFO doc.Pdf2Image - Page 10 takes 73.145.
> [ForkJoinPool.commonPool-worker-9] INFO doc.Pdf2Image - Page 2 takes 201.11.
> [main] INFO doc.Pdf2Image - Page 6 takes 202.048.
> Also I tried to use ImageMagick to do the same thing with the same DPI and this is what I get, which seems much faster for pages with more objects. 
> [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
> [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
> [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
> [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
> [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
> [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
> [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
> [ForkJoinPool.commonPool-worker-2] INFO doc.ProcessDoc - Page 9 takes 1.684.
> [ForkJoinPool.commonPool-worker-11] INFO doc.ProcessDoc - Page 1 takes 2.081.
> [ForkJoinPool.commonPool-worker-8] INFO doc.ProcessDoc - Page 5 takes 2.095.
> [ForkJoinPool.commonPool-worker-4] INFO doc.ProcessDoc - Page 8 takes 2.208.
> [ForkJoinPool.commonPool-worker-15] INFO doc.ProcessDoc - Page 7 takes 2.336.
> [ForkJoinPool.commonPool-worker-10] INFO doc.ProcessDoc - Page 3 takes 2.443.
> [ForkJoinPool.commonPool-worker-13] INFO doc.ProcessDoc - Page 4 takes 2.485.
> [ForkJoinPool.commonPool-worker-6] INFO doc.ProcessDoc - Page 0 takes 3.722.
> [ForkJoinPool.commonPool-worker-1] INFO doc.ProcessDoc - Page 10 takes 3.765.
> [main] INFO doc.ProcessDoc - Page 6 takes 4.479.
> [ForkJoinPool.commonPool-worker-9] INFO doc.ProcessDoc - Page 2 takes 4.51.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org