You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Ethan Huang <yu...@gmail.com> on 2021/02/05 19:17:18 UTC

Rendering slow for some pages. Anyway to make it faster?

We are converting PDF files into images and the way we are doing it is
breaking a single PDF files into several PDDocument, one per page, and
converting them in parallel.



What I found is for pages with more objects, the processing is going to
take much longer (see below logs, time unit in seconds).

I cannot share the test file for now. I will need to ask for permission.

Is there a way to make it faster? Also I see the logs for pages requiring
longer processing time.

Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
getAnchorRect
INFO: Pattern surface is too large, will be clipped
Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
getAnchorRect
INFO: width: 4405.8223, height: -4405.8223
Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
getAnchorRect
INFO: XStep: 1707.63, YStep: 1707.63
Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
getAnchorRect
INFO: bbox: [-54.8253,-217.611,1652.8,1490.02]
Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
getAnchorRect
INFO: pattern matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
getAnchorRect
INFO: concatenated matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]


Logs showing objects count and processing duration per page for the file
with PDFBox:


[main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
[main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
[main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
[main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
[main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
[main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
[main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
[main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
[main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
[main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
[main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.

[ForkJoinPool.commonPool-worker-10] INFO doc.Pdf2Image - Page 3 takes 0.803.
[ForkJoinPool.commonPool-worker-13] INFO doc.Pdf2Image - Page 8 takes 0.805.
[ForkJoinPool.commonPool-worker-8] INFO doc.Pdf2Image - Page 4 takes 0.822.
[ForkJoinPool.commonPool-worker-15] INFO doc.Pdf2Image - Page 0 takes 0.852.
[ForkJoinPool.commonPool-worker-11] INFO doc.Pdf2Image - Page 5 takes 0.892.
[ForkJoinPool.commonPool-worker-4] INFO doc.Pdf2Image - Page 1 takes 0.901.
[ForkJoinPool.commonPool-worker-6] INFO doc.Pdf2Image - Page 7 takes 0.962.
[ForkJoinPool.commonPool-worker-2] INFO doc.Pdf2Image - Page 9 takes 1.075.
[ForkJoinPool.commonPool-worker-1] INFO doc.Pdf2Image - Page 10 takes
73.145.
[ForkJoinPool.commonPool-worker-9] INFO doc.Pdf2Image - Page 2 takes 201.11.
[main] INFO doc.Pdf2Image - Page 6 takes 202.048.

Also I tried to use ImageMagick to do the same thing with the same DPI and
this is what I get, which seems much faster for pages with more objects,
although it is a bit slower than PDFBox for other pages.

[main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
[main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
[main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
[main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
[main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
[main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
[main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
[main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
[main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
[main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
[main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
[ForkJoinPool.commonPool-worker-2] INFO doc.ProcessDoc - Page 9 takes 1.684.
[ForkJoinPool.commonPool-worker-11] INFO doc.ProcessDoc - Page 1 takes
2.081.
[ForkJoinPool.commonPool-worker-8] INFO doc.ProcessDoc - Page 5 takes 2.095.
[ForkJoinPool.commonPool-worker-4] INFO doc.ProcessDoc - Page 8 takes 2.208.
[ForkJoinPool.commonPool-worker-15] INFO doc.ProcessDoc - Page 7 takes
2.336.
[ForkJoinPool.commonPool-worker-10] INFO doc.ProcessDoc - Page 3 takes
2.443.
[ForkJoinPool.commonPool-worker-13] INFO doc.ProcessDoc - Page 4 takes
2.485.
[ForkJoinPool.commonPool-worker-6] INFO doc.ProcessDoc - Page 0 takes 3.722.
[ForkJoinPool.commonPool-worker-1] INFO doc.ProcessDoc - Page 10 takes
3.765.
[main] INFO doc.ProcessDoc - Page 6 takes 4.479.
[ForkJoinPool.commonPool-worker-9] INFO doc.ProcessDoc - Page 2 takes 4.51.

Re: Rendering slow for some pages. Anyway to make it faster?

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

I forgot one thing, you can activate subsampling with 
PDFRenderer.setSubsamplingAllowed(). In some cases (large images) this 
will make things faster, with a slight quality loss.

(However there was a bug, see 
https://issues.apache.org/jira/browse/PDFBOX-5091 , so try with the 
snapshot link mentioned at the bottom or with 2.0.20 to see if your file 
gets faster)

You could also try to see what happens by setting low quality rendering 
hints.

Things that usually make rendering slow:
- thousands of images
- huge shadings
- very complex clipping paths

Tilman

Am 05.02.2021 um 22:45 schrieb Ethan Huang:
> Thanks for the info! We would prefer to continue with PDFBox if possible.
> Lowering resolution would bring bad user experience for us.
>
> I am requesting for sharing the files. Once available, I am going to share
> them here.
>
> On Fri, Feb 5, 2021 at 11:30 AM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Am 05.02.2021 um 20:17 schrieb Ethan Huang:
>>> We are converting PDF files into images and the way we are doing it is
>>> breaking a single PDF files into several PDDocument, one per page, and
>>> converting them in parallel.
>>>
>>>
>>>
>>> What I found is for pages with more objects, the processing is going to
>>> take much longer (see below logs, time unit in seconds).
>>>
>>> I cannot share the test file for now. I will need to ask for permission.
>>
>> Please do so. It is unlikely, but sometimes we do find an optimization
>> potential in PDFBox when a file is slow. However in most cases we can't
>> help.
>>
>>
>>> Is there a way to make it faster? Also I see the logs for pages requiring
>>> longer processing time.
>> No, not really. You could lower the resolution.
>>
>>
>>> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
>>> getAnchorRect
>>> INFO: Pattern surface is too large, will be clipped
>>> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
>>> getAnchorRect
>>> INFO: width: 4405.8223, height: -4405.8223
>>> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
>>> getAnchorRect
>>> INFO: XStep: 1707.63, YStep: 1707.63
>>> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
>>> getAnchorRect
>>> INFO: bbox: [-54.8253,-217.611,1652.8,1490.02]
>>> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
>>> getAnchorRect
>>> INFO: pattern matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
>>> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
>>> getAnchorRect
>>> INFO: concatenated matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
>>>
>>>
>>> Logs showing objects count and processing duration per page for the file
>>> with PDFBox:
>>>
>>>
>>> [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
>>>
>>> [ForkJoinPool.commonPool-worker-10] INFO doc.Pdf2Image - Page 3 takes
>> 0.803.
>>> [ForkJoinPool.commonPool-worker-13] INFO doc.Pdf2Image - Page 8 takes
>> 0.805.
>>> [ForkJoinPool.commonPool-worker-8] INFO doc.Pdf2Image - Page 4 takes
>> 0.822.
>>> [ForkJoinPool.commonPool-worker-15] INFO doc.Pdf2Image - Page 0 takes
>> 0.852.
>>> [ForkJoinPool.commonPool-worker-11] INFO doc.Pdf2Image - Page 5 takes
>> 0.892.
>>> [ForkJoinPool.commonPool-worker-4] INFO doc.Pdf2Image - Page 1 takes
>> 0.901.
>>> [ForkJoinPool.commonPool-worker-6] INFO doc.Pdf2Image - Page 7 takes
>> 0.962.
>>> [ForkJoinPool.commonPool-worker-2] INFO doc.Pdf2Image - Page 9 takes
>> 1.075.
>>> [ForkJoinPool.commonPool-worker-1] INFO doc.Pdf2Image - Page 10 takes
>>> 73.145.
>>> [ForkJoinPool.commonPool-worker-9] INFO doc.Pdf2Image - Page 2 takes
>> 201.11.
>>> [main] INFO doc.Pdf2Image - Page 6 takes 202.048.
>>
>> I don't think there is a correlation between the number of objects and
>> the rendering time.
>>
>>
>>> Also I tried to use ImageMagick to do the same thing with the same DPI
>> and
>>> this is what I get, which seems much faster for pages with more objects,
>>> although it is a bit slower than PDFBox for other pages.
>>>
>>> [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
>>> [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
>>> [ForkJoinPool.commonPool-worker-2] INFO doc.ProcessDoc - Page 9 takes
>> 1.684.
>>> [ForkJoinPool.commonPool-worker-11] INFO doc.ProcessDoc - Page 1 takes
>>> 2.081.
>>> [ForkJoinPool.commonPool-worker-8] INFO doc.ProcessDoc - Page 5 takes
>> 2.095.
>>> [ForkJoinPool.commonPool-worker-4] INFO doc.ProcessDoc - Page 8 takes
>> 2.208.
>>> [ForkJoinPool.commonPool-worker-15] INFO doc.ProcessDoc - Page 7 takes
>>> 2.336.
>>> [ForkJoinPool.commonPool-worker-10] INFO doc.ProcessDoc - Page 3 takes
>>> 2.443.
>>> [ForkJoinPool.commonPool-worker-13] INFO doc.ProcessDoc - Page 4 takes
>>> 2.485.
>>> [ForkJoinPool.commonPool-worker-6] INFO doc.ProcessDoc - Page 0 takes
>> 3.722.
>>> [ForkJoinPool.commonPool-worker-1] INFO doc.ProcessDoc - Page 10 takes
>>> 3.765.
>>> [main] INFO doc.ProcessDoc - Page 6 takes 4.479.
>>> [ForkJoinPool.commonPool-worker-9] INFO doc.ProcessDoc - Page 2 takes
>> 4.51.
>> ImageMagick uses ghostscript which is written in C++, and they're 10
>> years ahead of us. IMHO they are the best, just below Adobe.
>>
>> Tilman
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Rendering slow for some pages. Anyway to make it faster?

Posted by Ethan Huang <yu...@gmail.com>.
Thanks for the info! We would prefer to continue with PDFBox if possible.
Lowering resolution would bring bad user experience for us.

I am requesting for sharing the files. Once available, I am going to share
them here.

On Fri, Feb 5, 2021 at 11:30 AM Tilman Hausherr <TH...@t-online.de>
wrote:

> Am 05.02.2021 um 20:17 schrieb Ethan Huang:
> > We are converting PDF files into images and the way we are doing it is
> > breaking a single PDF files into several PDDocument, one per page, and
> > converting them in parallel.
> >
> >
> >
> > What I found is for pages with more objects, the processing is going to
> > take much longer (see below logs, time unit in seconds).
> >
> > I cannot share the test file for now. I will need to ask for permission.
>
>
> Please do so. It is unlikely, but sometimes we do find an optimization
> potential in PDFBox when a file is slow. However in most cases we can't
> help.
>
>
> >
> > Is there a way to make it faster? Also I see the logs for pages requiring
> > longer processing time.
>
> No, not really. You could lower the resolution.
>
>
> >
> > Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> > getAnchorRect
> > INFO: Pattern surface is too large, will be clipped
> > Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> > getAnchorRect
> > INFO: width: 4405.8223, height: -4405.8223
> > Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> > getAnchorRect
> > INFO: XStep: 1707.63, YStep: 1707.63
> > Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> > getAnchorRect
> > INFO: bbox: [-54.8253,-217.611,1652.8,1490.02]
> > Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> > getAnchorRect
> > INFO: pattern matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
> > Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> > getAnchorRect
> > INFO: concatenated matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
> >
> >
> > Logs showing objects count and processing duration per page for the file
> > with PDFBox:
> >
> >
> > [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
> >
> > [ForkJoinPool.commonPool-worker-10] INFO doc.Pdf2Image - Page 3 takes
> 0.803.
> > [ForkJoinPool.commonPool-worker-13] INFO doc.Pdf2Image - Page 8 takes
> 0.805.
> > [ForkJoinPool.commonPool-worker-8] INFO doc.Pdf2Image - Page 4 takes
> 0.822.
> > [ForkJoinPool.commonPool-worker-15] INFO doc.Pdf2Image - Page 0 takes
> 0.852.
> > [ForkJoinPool.commonPool-worker-11] INFO doc.Pdf2Image - Page 5 takes
> 0.892.
> > [ForkJoinPool.commonPool-worker-4] INFO doc.Pdf2Image - Page 1 takes
> 0.901.
> > [ForkJoinPool.commonPool-worker-6] INFO doc.Pdf2Image - Page 7 takes
> 0.962.
> > [ForkJoinPool.commonPool-worker-2] INFO doc.Pdf2Image - Page 9 takes
> 1.075.
> > [ForkJoinPool.commonPool-worker-1] INFO doc.Pdf2Image - Page 10 takes
> > 73.145.
> > [ForkJoinPool.commonPool-worker-9] INFO doc.Pdf2Image - Page 2 takes
> 201.11.
> > [main] INFO doc.Pdf2Image - Page 6 takes 202.048.
>
>
> I don't think there is a correlation between the number of objects and
> the rendering time.
>
>
> >
> > Also I tried to use ImageMagick to do the same thing with the same DPI
> and
> > this is what I get, which seems much faster for pages with more objects,
> > although it is a bit slower than PDFBox for other pages.
> >
> > [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
> > [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
> > [ForkJoinPool.commonPool-worker-2] INFO doc.ProcessDoc - Page 9 takes
> 1.684.
> > [ForkJoinPool.commonPool-worker-11] INFO doc.ProcessDoc - Page 1 takes
> > 2.081.
> > [ForkJoinPool.commonPool-worker-8] INFO doc.ProcessDoc - Page 5 takes
> 2.095.
> > [ForkJoinPool.commonPool-worker-4] INFO doc.ProcessDoc - Page 8 takes
> 2.208.
> > [ForkJoinPool.commonPool-worker-15] INFO doc.ProcessDoc - Page 7 takes
> > 2.336.
> > [ForkJoinPool.commonPool-worker-10] INFO doc.ProcessDoc - Page 3 takes
> > 2.443.
> > [ForkJoinPool.commonPool-worker-13] INFO doc.ProcessDoc - Page 4 takes
> > 2.485.
> > [ForkJoinPool.commonPool-worker-6] INFO doc.ProcessDoc - Page 0 takes
> 3.722.
> > [ForkJoinPool.commonPool-worker-1] INFO doc.ProcessDoc - Page 10 takes
> > 3.765.
> > [main] INFO doc.ProcessDoc - Page 6 takes 4.479.
> > [ForkJoinPool.commonPool-worker-9] INFO doc.ProcessDoc - Page 2 takes
> 4.51.
> >
> ImageMagick uses ghostscript which is written in C++, and they're 10
> years ahead of us. IMHO they are the best, just below Adobe.
>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Rendering slow for some pages. Anyway to make it faster?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 05.02.2021 um 20:17 schrieb Ethan Huang:
> We are converting PDF files into images and the way we are doing it is
> breaking a single PDF files into several PDDocument, one per page, and
> converting them in parallel.
>
>
>
> What I found is for pages with more objects, the processing is going to
> take much longer (see below logs, time unit in seconds).
>
> I cannot share the test file for now. I will need to ask for permission.


Please do so. It is unlikely, but sometimes we do find an optimization 
potential in PDFBox when a file is slow. However in most cases we can't 
help.


>
> Is there a way to make it faster? Also I see the logs for pages requiring
> longer processing time.

No, not really. You could lower the resolution.


>
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> getAnchorRect
> INFO: Pattern surface is too large, will be clipped
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> getAnchorRect
> INFO: width: 4405.8223, height: -4405.8223
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> getAnchorRect
> INFO: XStep: 1707.63, YStep: 1707.63
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> getAnchorRect
> INFO: bbox: [-54.8253,-217.611,1652.8,1490.02]
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> getAnchorRect
> INFO: pattern matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
> Feb 04, 2021 5:39:20 PM org.apache.pdfbox.rendering.TilingPaint
> getAnchorRect
> INFO: concatenated matrix: [2.58008,0.0,0.0,-2.58008,0.0,540.0]
>
>
> Logs showing objects count and processing duration per page for the file
> with PDFBox:
>
>
> [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
> [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
> [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
> [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
> [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
> [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
> [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
>
> [ForkJoinPool.commonPool-worker-10] INFO doc.Pdf2Image - Page 3 takes 0.803.
> [ForkJoinPool.commonPool-worker-13] INFO doc.Pdf2Image - Page 8 takes 0.805.
> [ForkJoinPool.commonPool-worker-8] INFO doc.Pdf2Image - Page 4 takes 0.822.
> [ForkJoinPool.commonPool-worker-15] INFO doc.Pdf2Image - Page 0 takes 0.852.
> [ForkJoinPool.commonPool-worker-11] INFO doc.Pdf2Image - Page 5 takes 0.892.
> [ForkJoinPool.commonPool-worker-4] INFO doc.Pdf2Image - Page 1 takes 0.901.
> [ForkJoinPool.commonPool-worker-6] INFO doc.Pdf2Image - Page 7 takes 0.962.
> [ForkJoinPool.commonPool-worker-2] INFO doc.Pdf2Image - Page 9 takes 1.075.
> [ForkJoinPool.commonPool-worker-1] INFO doc.Pdf2Image - Page 10 takes
> 73.145.
> [ForkJoinPool.commonPool-worker-9] INFO doc.Pdf2Image - Page 2 takes 201.11.
> [main] INFO doc.Pdf2Image - Page 6 takes 202.048.


I don't think there is a correlation between the number of objects and 
the rendering time.


>
> Also I tried to use ImageMagick to do the same thing with the same DPI and
> this is what I get, which seems much faster for pages with more objects,
> although it is a bit slower than PDFBox for other pages.
>
> [main] INFO doc.DocumentProcessorUtils - page 0 has 20 objs.
> [main] INFO doc.DocumentProcessorUtils - page 1 has 24 objs.
> [main] INFO doc.DocumentProcessorUtils - page 2 has 176 objs.
> [main] INFO doc.DocumentProcessorUtils - page 3 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 4 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 5 has 21 objs.
> [main] INFO doc.DocumentProcessorUtils - page 6 has 138 objs.
> [main] INFO doc.DocumentProcessorUtils - page 7 has 33 objs.
> [main] INFO doc.DocumentProcessorUtils - page 8 has 22 objs.
> [main] INFO doc.DocumentProcessorUtils - page 9 has 26 objs.
> [main] INFO doc.DocumentProcessorUtils - page 10 has 52 objs.
> [ForkJoinPool.commonPool-worker-2] INFO doc.ProcessDoc - Page 9 takes 1.684.
> [ForkJoinPool.commonPool-worker-11] INFO doc.ProcessDoc - Page 1 takes
> 2.081.
> [ForkJoinPool.commonPool-worker-8] INFO doc.ProcessDoc - Page 5 takes 2.095.
> [ForkJoinPool.commonPool-worker-4] INFO doc.ProcessDoc - Page 8 takes 2.208.
> [ForkJoinPool.commonPool-worker-15] INFO doc.ProcessDoc - Page 7 takes
> 2.336.
> [ForkJoinPool.commonPool-worker-10] INFO doc.ProcessDoc - Page 3 takes
> 2.443.
> [ForkJoinPool.commonPool-worker-13] INFO doc.ProcessDoc - Page 4 takes
> 2.485.
> [ForkJoinPool.commonPool-worker-6] INFO doc.ProcessDoc - Page 0 takes 3.722.
> [ForkJoinPool.commonPool-worker-1] INFO doc.ProcessDoc - Page 10 takes
> 3.765.
> [main] INFO doc.ProcessDoc - Page 6 takes 4.479.
> [ForkJoinPool.commonPool-worker-9] INFO doc.ProcessDoc - Page 2 takes 4.51.
>
ImageMagick uses ghostscript which is written in C++, and they're 10 
years ahead of us. IMHO they are the best, just below Adobe.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org