You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Manuel Fomitescu <ma...@gmail.com> on 2016/09/28 13:02:32 UTC

Sugestion for improvement

Hello,

I wan to use pdfbox in my project instead of the pdfl C adobe library.
For the task I have to calculate the width/height of the first page of the
document I used the following code:

PDDocument document = PDDocument.load(new File(args[0]));
PDPage pdPage = document.getPage(0);

System.out.println("PDFBOX - NoPage: " + document.getNumberOfPages());
Aystem.out.println("PDFBOX - FirstPage Height: " +
 pdPage.getMediaBox().getHeight());
System.out.println("PDFBOX - FirstPage Width: " +
pdPage.getMediaBox().getWidth());


To obtain the same thing with pdfl I run I command line with some
parameters and read the response from a file, more ugly from java code.

But the performance is a big problem.

So with a 90MB pdf document I obtained a performance of 400millisec with
pdfbox and 50millisec with pdfl. For a 1.7GB document I obtained a
performace of 47106millisec with pdfbox and 151millisec with pdfl. These
are very big differences.

The main problem is that for accessing the first page I have to load the
entire document and after that I can access the first page.
PDFL has a constructor for a document with the page parameter and loads
only that page from the document. Because of that it is working so fast

Best regards,
Manuel.

Re: Sugestion for improvement

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 28.09.2016 um 15:02 schrieb Manuel Fomitescu:
> Hello,
>
> I wan to use pdfbox in my project instead of the pdfl C adobe library.
> For the task I have to calculate the width/height of the first page of the
> document I used the following code:
>
> PDDocument document = PDDocument.load(new File(args[0]));
> PDPage pdPage = document.getPage(0);
>
> System.out.println("PDFBOX - NoPage: " + document.getNumberOfPages());
> Aystem.out.println("PDFBOX - FirstPage Height: " +
>   pdPage.getMediaBox().getHeight());
> System.out.println("PDFBOX - FirstPage Width: " +
> pdPage.getMediaBox().getWidth());
>
>
> To obtain the same thing with pdfl I run I command line with some
> parameters and read the response from a file, more ugly from java code.
>
> But the performance is a big problem.
>
> So with a 90MB pdf document I obtained a performance of 400millisec with
> pdfbox and 50millisec with pdfl. For a 1.7GB document I obtained a
> performace of 47106millisec with pdfbox and 151millisec with pdfl. These
> are very big differences.
>
> The main problem is that for accessing the first page I have to load the
> entire document and after that I can access the first page.
> PDFL has a constructor for a document with the page parameter and loads
> only that page from the document. Because of that it is working so fast
>
> Best regards,
> Manuel.
>

This is a known problem that can't be solved in a few hours / days. 
PDFBox does not "parse on demand".

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org