You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Daniel Manzke <da...@googlemail.com> on 2008/11/11 12:49:10 UTC

Performance Tuning

Hi there,
first I have to say:"Good job." It's really a nice project.


I'm using the PDFBox for transforming PDF to Image. But I have some
performance issues, so I had a look at the source code. I saw several points
where I could save time. ;)
Are you interested in them?
I would prefer that we could discuss about this, because maybe it is as
designed and my steps are dangerous.



Best Regards,
Daniel

Re: PDFBox needs you!

Posted by Rainer Schwarze <rs...@admadic.de>.
Jeremias Maerki wrote:
[...]
> That's not a call to just you, Daniel, but to everyone on this list.
> Andreas Lehmkühler, too. Every PDFBox user. Please take the opportunity
> to shape PDFBox's future, to make it even better!

Hi,

I'm using PDFBox and will - as time permits - work on the project. Maybe
the most fitting approach for me currently is to spend some spare
minutes every now and then on looking through the bug reports and add
comments or submit patches.

best wishes, Rainer
-- 

PDFBox needs you! (was: Re: Performance Tuning)

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Daniel,

I'm speaking here as a mentor for PDFBox incubation. One of the most
important goals of the incubation of PDFBox here at the ASF is to build
a community. Everyone is invited to help out, any way she/he can.
Answering questions: great! Writing documentation, code and submitting
patches: even better! Unfortunately, PDFBox hasn't seen much developer
attention lately. So PDFBox needs every helping hand. Please continue
what you're doing: discuss improvements, collectively decide which roads
to take, write code and submit patches via JIRA. Hopefully, they will be
reviewed and processed in a timely manner. If not (sadly, that might
happen as there are no active committers at the moment which probably leaves
the job to the mentors for now), please just yell again. Once we can see
someone contribute on a regular basis, we'll see to it that he can
become a committer with write access to the code repository. Only
through that will PDFBox be able to graduate.

That's not a call to just you, Daniel, but to everyone on this list.
Andreas Lehmkühler, too. Every PDFBox user. Please take the opportunity
to shape PDFBox's future, to make it even better!

On 11.11.2008 16:37:26 Daniel Manzke wrote:
<snip/>
> Are you a developer of the pdfbox? Do you need help? ;)
<snip/>


Thanks for listening,
Jeremias Maerki (PDFBox mentor)


Re: Performance Tuning

Posted by Daniel Manzke <da...@googlemail.com>.
Alright,
I did a quick-and-dirty fix for me. I've extended the PageDrawer class with
a boolean attribute "drawImage". The Invoker class checks now if I want that
the images should be rendered. It's working quite well.


In the meantime I read the mail about PDFBox contribution.  Jeremias, I will
try to help where I can. I've done this some time for the JAX-WS project.


Lets see where we could go. ;)


Nice evening (in Germany ;)),
Daniel

2008/11/11 Daniel Manzke <da...@googlemail.com>

> Yeahhh what a hint!
>
> For my pdf document which has a lot of images at the first side (cover)
> this could be a way. Now I just need 2 seconds for rendering instead of 19
> seconds! Hmmm, I should have a look if there could be a standard way. Maybe
> with a parameter or something else.
>
> java5 (just optional):
> Source code with java 5 features. By the way, is the pdf model of the
> specification so bad, because I saw a lot of "instanceof".
>
>
> Are you a developer of the pdfbox? Do you need help? ;)
>
>
> Bye,
> Daniel
>
> I hate the winter, it feels like the work is over, but it's early
> afternoon. :)
>
>
>
> 2008/11/11 <An...@rwe.com>
>
>>
>> > Sharing a BufferedImage:
>> > I tried this and got a performance boost, but than I asked me, if I ever
>> read a PDF with different page sizes.
>> > Due some scaling problems which I had with my implementation I just
>> commented the code out and thought that I
>> > should give this mailing list a try. :)
>> We'll see ...
>>
>>
>> > Is there a way to convert a pdf page to an image without the images? I
>> couldn't figure out how to do it.
>> Nope, there is nothing like that. Have a look at
>> org.pdfbox.util.operator.pagedrawer.Invoke.process(). There you'll find the
>> picture handling.
>>
>>
>> > "Bad question": Is there a Java 5 version? :)
>> There aren't bad/stupid questions, only bad/stupid answers.
>> What are you looking for in detail? A compiled java 5 version, or a
>> version using java5-features?
>>
>>
>>
>> 2008/11/11 <An...@rwe.com>
>>
>> > > 1. PageWriter:
>> > > Every time the convertToImage method is called there will be a new
>> > PageDrawer created
>> > > and the parent class PDFStreamEngine loads the resource bundles.
>> > >
>> > > Why not use some kind of a cache for loading the bundles? (Load
>> > > resources
>> > just ones)
>> > It's a little bit complicated. As I understood, the resources contains
>> > the operator-class-mapping. Every entry will be linked to the
>> > PDFStreamEngine as some sort of callback-mechanism. Consequently every
>> > instance of a PDFStreamEngine needs its own mapping.
>> >
>> > > Why not use one PageWriter for the whole document? (Just share it)
>> > This could be done easily.
>> >
>> > > 2. PageWriter:
>> > > Every time the convertToImage method is called there will be a new
>> > BufferedImage+Graphics.
>> > > (but not so expensive like Graphics2D.scale() ;))
>> > >
>> > > Why not reuse a BufferedImage for all pages? It is faster to call
>> > > Graphics2d.fillRect() than creating a new one.
>> > This is a litte bit complicated too. Within a pdf-document all pages
>> > may have a different size and orientation. So, thinking about
>> > oo-programming, only the class PDPage "knows" everything about the
>> > page, consequently PDPage has to provide the conversion to a image.
>> >
>> > A this point the main questions is: did you ever implement your
>> > suggestions to compare the performance? Or, are these thoughts
>> > theoretical? Did you perhaps use some kind of a profiler?
>> >
>> > I'm using this feature (convertToImage, printing) as well, and I don't
>> > have any serious performance-issues. But if there is some potential to
>> > speed up pdfbox, let's try to do it. But we have to compare the costs
>> > against the profit.
>> >
>> >
>> > Greetings from rainy Essen just in the middle of the Ruhrpott ;-)
>> > Andreas
>> >
>> > 2008/11/11 <An...@rwe.com>
>> >
>> > > Hi Daniel,
>> > >
>> > > don't hesitate, I guess your suggestions are welcome wether they
>> > > will included or not.
>> > >
>> > >
>> > > Andreas
>> > >
>> > >
>> > > -----Ursprüngliche Nachricht-----
>> > > Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
>> > > Gesendet: Dienstag, 11. November 2008 12:49
>> > > An: pdfbox-users@incubator.apache.org
>> > > Betreff: Performance Tuning
>> > >
>> > >
>> > > Hi there,
>> > > first I have to say:"Good job." It's really a nice project.
>> > >
>> > >
>> > > I'm using the PDFBox for transforming PDF to Image. But I have some
>> > > performance issues, so I had a look at the source code. I saw
>> > > several points where I could save time. ;) Are you interested in
>> > > them? I would prefer that we could discuss about this, because maybe
>> > > it is as designed and my steps are dangerous.
>> > >
>> > >
>> > >
>> > > Best Regards,
>> > > Daniel
>> > >
>> > > ----------------------------------------------------------------
>> > > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan
>> > > Niehusmann -
>> > > - Sitz der Gesellschaft: Dortmund -
>> > > - Eingetragen beim Amtsgericht Dortmund -
>> > > - Handelsregister-Nr. HR B 21222 -
>> > > - USt.-IdNr. DE 2588 96 719 -
>> > >
>> >
>> >
>> >
>> > --
>> > Mit freundlichen Grüßen
>> >
>> > Daniel Manzke
>> >
>> > ----------------------------------------------------------------
>> > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan
>> > Niehusmann -
>> > - Sitz der Gesellschaft: Dortmund -
>> > - Eingetragen beim Amtsgericht Dortmund -
>> > - Handelsregister-Nr. HR B 21222 -
>> > - USt.-IdNr. DE 2588 96 719 -
>> >
>>
>>
>>
>> --
>> Mit freundlichen Grüßen
>>
>> Daniel Manzke
>>
>> ----------------------------------------------------------------
>> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender),
>> Stefan Niehusmann -
>> - Sitz der Gesellschaft: Dortmund -
>> - Eingetragen beim Amtsgericht Dortmund -
>> - Handelsregister-Nr. HR B 21222 -
>> - USt.-IdNr. DE 2588 96 719 -
>>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

AW: Performance Tuning

Posted by An...@rwe.com.
> java5 (just optional):
> Source code with java 5 features. By the way, is the pdf model of the specification so bad, because I saw a lot of "instanceof".
Hmm, as far as I now, java5-features aren't used because of becoming incompatible to former java-version. But perhaps future-version will be based on java5


> Are you a developer of the pdfbox? Do you need help? ;)
Some sort of. I'm using pdfbox in conjunction with a layout-engine to generate pdf-documents for printing and archiving. Therefore I have to have a look to the source to understand the whole thing. I've already provided some minor patches to pdfbox some time ago.

I'll try to support the pdfbox team as far as I can.


Bye,
Andreas

2008/11/11 <An...@rwe.com>

>
> > Sharing a BufferedImage:
> > I tried this and got a performance boost, but than I asked me, if I 
> > ever
> read a PDF with different page sizes.
> > Due some scaling problems which I had with my implementation I just
> commented the code out and thought that I
> > should give this mailing list a try. :)
> We'll see ...
>
>
> > Is there a way to convert a pdf page to an image without the images? 
> > I
> couldn't figure out how to do it.
> Nope, there is nothing like that. Have a look at 
> org.pdfbox.util.operator.pagedrawer.Invoke.process(). There you'll 
> find the picture handling.
>
>
> > "Bad question": Is there a Java 5 version? :)
> There aren't bad/stupid questions, only bad/stupid answers. What are 
> you looking for in detail? A compiled java 5 version, or a version 
> using java5-features?
>
>
>
> 2008/11/11 <An...@rwe.com>
>
> > > 1. PageWriter:
> > > Every time the convertToImage method is called there will be a new
> > PageDrawer created
> > > and the parent class PDFStreamEngine loads the resource bundles.
> > >
> > > Why not use some kind of a cache for loading the bundles? (Load 
> > > resources
> > just ones)
> > It's a little bit complicated. As I understood, the resources 
> > contains the operator-class-mapping. Every entry will be linked to 
> > the PDFStreamEngine as some sort of callback-mechanism. Consequently 
> > every instance of a PDFStreamEngine needs its own mapping.
> >
> > > Why not use one PageWriter for the whole document? (Just share it)
> > This could be done easily.
> >
> > > 2. PageWriter:
> > > Every time the convertToImage method is called there will be a new
> > BufferedImage+Graphics.
> > > (but not so expensive like Graphics2D.scale() ;))
> > >
> > > Why not reuse a BufferedImage for all pages? It is faster to call
> > > Graphics2d.fillRect() than creating a new one.
> > This is a litte bit complicated too. Within a pdf-document all pages 
> > may have a different size and orientation. So, thinking about 
> > oo-programming, only the class PDPage "knows" everything about the 
> > page, consequently PDPage has to provide the conversion to a image.
> >
> > A this point the main questions is: did you ever implement your 
> > suggestions to compare the performance? Or, are these thoughts 
> > theoretical? Did you perhaps use some kind of a profiler?
> >
> > I'm using this feature (convertToImage, printing) as well, and I 
> > don't have any serious performance-issues. But if there is some 
> > potential to speed up pdfbox, let's try to do it. But we have to 
> > compare the costs against the profit.
> >
> >
> > Greetings from rainy Essen just in the middle of the Ruhrpott ;-) 
> > Andreas
> >
> > 2008/11/11 <An...@rwe.com>
> >
> > > Hi Daniel,
> > >
> > > don't hesitate, I guess your suggestions are welcome wether they 
> > > will included or not.
> > >
> > >
> > > Andreas
> > >
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
> > > Gesendet: Dienstag, 11. November 2008 12:49
> > > An: pdfbox-users@incubator.apache.org
> > > Betreff: Performance Tuning
> > >
> > >
> > > Hi there,
> > > first I have to say:"Good job." It's really a nice project.
> > >
> > >
> > > I'm using the PDFBox for transforming PDF to Image. But I have 
> > > some performance issues, so I had a look at the source code. I saw 
> > > several points where I could save time. ;) Are you interested in 
> > > them? I would prefer that we could discuss about this, because 
> > > maybe it is as designed and my steps are dangerous.
> > >
> > >
> > >
> > > Best Regards,
> > > Daniel
> > >
> > > ----------------------------------------------------------------
> > > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan 
> > > Niehusmann -
> > > - Sitz der Gesellschaft: Dortmund -
> > > - Eingetragen beim Amtsgericht Dortmund -
> > > - Handelsregister-Nr. HR B 21222 -
> > > - USt.-IdNr. DE 2588 96 719 -
> > >
> >
> >
> >
> > --
> > Mit freundlichen Grüßen
> >
> > Daniel Manzke
> >
> > ----------------------------------------------------------------
> > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan 
> > Niehusmann -
> > - Sitz der Gesellschaft: Dortmund -
> > - Eingetragen beim Amtsgericht Dortmund -
> > - Handelsregister-Nr. HR B 21222 -
> > - USt.-IdNr. DE 2588 96 719 -
> >
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
> ----------------------------------------------------------------
> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan 
> Niehusmann -
> - Sitz der Gesellschaft: Dortmund -
> - Eingetragen beim Amtsgericht Dortmund -
> - Handelsregister-Nr. HR B 21222 -
> - USt.-IdNr. DE 2588 96 719 -
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), 
Stefan Niehusmann - 
- Sitz der Gesellschaft: Dortmund - 
- Eingetragen beim Amtsgericht Dortmund - 
- Handelsregister-Nr. HR B 21222 - 
- USt.-IdNr. DE 2588 96 719 - 

Re: Performance Tuning

Posted by Daniel Manzke <da...@googlemail.com>.
Yeahhh what a hint!

For my pdf document which has a lot of images at the first side (cover) this
could be a way. Now I just need 2 seconds for rendering instead of 19
seconds! Hmmm, I should have a look if there could be a standard way. Maybe
with a parameter or something else.

java5 (just optional):
Source code with java 5 features. By the way, is the pdf model of the
specification so bad, because I saw a lot of "instanceof".


Are you a developer of the pdfbox? Do you need help? ;)


Bye,
Daniel

I hate the winter, it feels like the work is over, but it's early afternoon.
:)



2008/11/11 <An...@rwe.com>

>
> > Sharing a BufferedImage:
> > I tried this and got a performance boost, but than I asked me, if I ever
> read a PDF with different page sizes.
> > Due some scaling problems which I had with my implementation I just
> commented the code out and thought that I
> > should give this mailing list a try. :)
> We'll see ...
>
>
> > Is there a way to convert a pdf page to an image without the images? I
> couldn't figure out how to do it.
> Nope, there is nothing like that. Have a look at
> org.pdfbox.util.operator.pagedrawer.Invoke.process(). There you'll find the
> picture handling.
>
>
> > "Bad question": Is there a Java 5 version? :)
> There aren't bad/stupid questions, only bad/stupid answers.
> What are you looking for in detail? A compiled java 5 version, or a version
> using java5-features?
>
>
>
> 2008/11/11 <An...@rwe.com>
>
> > > 1. PageWriter:
> > > Every time the convertToImage method is called there will be a new
> > PageDrawer created
> > > and the parent class PDFStreamEngine loads the resource bundles.
> > >
> > > Why not use some kind of a cache for loading the bundles? (Load
> > > resources
> > just ones)
> > It's a little bit complicated. As I understood, the resources contains
> > the operator-class-mapping. Every entry will be linked to the
> > PDFStreamEngine as some sort of callback-mechanism. Consequently every
> > instance of a PDFStreamEngine needs its own mapping.
> >
> > > Why not use one PageWriter for the whole document? (Just share it)
> > This could be done easily.
> >
> > > 2. PageWriter:
> > > Every time the convertToImage method is called there will be a new
> > BufferedImage+Graphics.
> > > (but not so expensive like Graphics2D.scale() ;))
> > >
> > > Why not reuse a BufferedImage for all pages? It is faster to call
> > > Graphics2d.fillRect() than creating a new one.
> > This is a litte bit complicated too. Within a pdf-document all pages
> > may have a different size and orientation. So, thinking about
> > oo-programming, only the class PDPage "knows" everything about the
> > page, consequently PDPage has to provide the conversion to a image.
> >
> > A this point the main questions is: did you ever implement your
> > suggestions to compare the performance? Or, are these thoughts
> > theoretical? Did you perhaps use some kind of a profiler?
> >
> > I'm using this feature (convertToImage, printing) as well, and I don't
> > have any serious performance-issues. But if there is some potential to
> > speed up pdfbox, let's try to do it. But we have to compare the costs
> > against the profit.
> >
> >
> > Greetings from rainy Essen just in the middle of the Ruhrpott ;-)
> > Andreas
> >
> > 2008/11/11 <An...@rwe.com>
> >
> > > Hi Daniel,
> > >
> > > don't hesitate, I guess your suggestions are welcome wether they
> > > will included or not.
> > >
> > >
> > > Andreas
> > >
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
> > > Gesendet: Dienstag, 11. November 2008 12:49
> > > An: pdfbox-users@incubator.apache.org
> > > Betreff: Performance Tuning
> > >
> > >
> > > Hi there,
> > > first I have to say:"Good job." It's really a nice project.
> > >
> > >
> > > I'm using the PDFBox for transforming PDF to Image. But I have some
> > > performance issues, so I had a look at the source code. I saw
> > > several points where I could save time. ;) Are you interested in
> > > them? I would prefer that we could discuss about this, because maybe
> > > it is as designed and my steps are dangerous.
> > >
> > >
> > >
> > > Best Regards,
> > > Daniel
> > >
> > > ----------------------------------------------------------------
> > > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan
> > > Niehusmann -
> > > - Sitz der Gesellschaft: Dortmund -
> > > - Eingetragen beim Amtsgericht Dortmund -
> > > - Handelsregister-Nr. HR B 21222 -
> > > - USt.-IdNr. DE 2588 96 719 -
> > >
> >
> >
> >
> > --
> > Mit freundlichen Grüßen
> >
> > Daniel Manzke
> >
> > ----------------------------------------------------------------
> > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan
> > Niehusmann -
> > - Sitz der Gesellschaft: Dortmund -
> > - Eingetragen beim Amtsgericht Dortmund -
> > - Handelsregister-Nr. HR B 21222 -
> > - USt.-IdNr. DE 2588 96 719 -
> >
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
> ----------------------------------------------------------------
> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender),
> Stefan Niehusmann -
> - Sitz der Gesellschaft: Dortmund -
> - Eingetragen beim Amtsgericht Dortmund -
> - Handelsregister-Nr. HR B 21222 -
> - USt.-IdNr. DE 2588 96 719 -
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

AW: Performance Tuning

Posted by An...@rwe.com.
> Sharing a BufferedImage:
> I tried this and got a performance boost, but than I asked me, if I ever read a PDF with different page sizes. 
> Due some scaling problems which I had with my implementation I just commented the code out and thought that I 
> should give this mailing list a try. :)
We'll see ...


> Is there a way to convert a pdf page to an image without the images? I couldn't figure out how to do it.
Nope, there is nothing like that. Have a look at org.pdfbox.util.operator.pagedrawer.Invoke.process(). There you'll find the picture handling. 


> "Bad question": Is there a Java 5 version? :)
There aren't bad/stupid questions, only bad/stupid answers.
What are you looking for in detail? A compiled java 5 version, or a version using java5-features?



2008/11/11 <An...@rwe.com>

> > 1. PageWriter:
> > Every time the convertToImage method is called there will be a new
> PageDrawer created
> > and the parent class PDFStreamEngine loads the resource bundles.
> >
> > Why not use some kind of a cache for loading the bundles? (Load 
> > resources
> just ones)
> It's a little bit complicated. As I understood, the resources contains 
> the operator-class-mapping. Every entry will be linked to the 
> PDFStreamEngine as some sort of callback-mechanism. Consequently every 
> instance of a PDFStreamEngine needs its own mapping.
>
> > Why not use one PageWriter for the whole document? (Just share it)
> This could be done easily.
>
> > 2. PageWriter:
> > Every time the convertToImage method is called there will be a new
> BufferedImage+Graphics.
> > (but not so expensive like Graphics2D.scale() ;))
> >
> > Why not reuse a BufferedImage for all pages? It is faster to call
> > Graphics2d.fillRect() than creating a new one.
> This is a litte bit complicated too. Within a pdf-document all pages 
> may have a different size and orientation. So, thinking about 
> oo-programming, only the class PDPage "knows" everything about the 
> page, consequently PDPage has to provide the conversion to a image.
>
> A this point the main questions is: did you ever implement your 
> suggestions to compare the performance? Or, are these thoughts 
> theoretical? Did you perhaps use some kind of a profiler?
>
> I'm using this feature (convertToImage, printing) as well, and I don't 
> have any serious performance-issues. But if there is some potential to 
> speed up pdfbox, let's try to do it. But we have to compare the costs 
> against the profit.
>
>
> Greetings from rainy Essen just in the middle of the Ruhrpott ;-) 
> Andreas
>
> 2008/11/11 <An...@rwe.com>
>
> > Hi Daniel,
> >
> > don't hesitate, I guess your suggestions are welcome wether they 
> > will included or not.
> >
> >
> > Andreas
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
> > Gesendet: Dienstag, 11. November 2008 12:49
> > An: pdfbox-users@incubator.apache.org
> > Betreff: Performance Tuning
> >
> >
> > Hi there,
> > first I have to say:"Good job." It's really a nice project.
> >
> >
> > I'm using the PDFBox for transforming PDF to Image. But I have some 
> > performance issues, so I had a look at the source code. I saw 
> > several points where I could save time. ;) Are you interested in 
> > them? I would prefer that we could discuss about this, because maybe 
> > it is as designed and my steps are dangerous.
> >
> >
> >
> > Best Regards,
> > Daniel
> >
> > ----------------------------------------------------------------
> > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan 
> > Niehusmann -
> > - Sitz der Gesellschaft: Dortmund -
> > - Eingetragen beim Amtsgericht Dortmund -
> > - Handelsregister-Nr. HR B 21222 -
> > - USt.-IdNr. DE 2588 96 719 -
> >
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
> ----------------------------------------------------------------
> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan 
> Niehusmann -
> - Sitz der Gesellschaft: Dortmund -
> - Eingetragen beim Amtsgericht Dortmund -
> - Handelsregister-Nr. HR B 21222 -
> - USt.-IdNr. DE 2588 96 719 -
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), 
Stefan Niehusmann - 
- Sitz der Gesellschaft: Dortmund - 
- Eingetragen beim Amtsgericht Dortmund - 
- Handelsregister-Nr. HR B 21222 - 
- USt.-IdNr. DE 2588 96 719 - 

Re: Performance Tuning

Posted by Daniel Manzke <da...@googlemail.com>.
Hey Andreas,
yeah I tried my hints as some kind of quick-and-dirty implementations.

Sharing the PageWriter:
In my version I'm sharing the PageWriter. I just create an instance in my
code and pass it by my own convertToImage(PageWriter). ;) Just for testing
issues. But the performance gains are not so big.

Sharing a BufferedImage:
I tried this and got a performance boost, but than I asked me, if I ever
read a PDF with different page sizes. Due some scaling problems which I had
with my implementation I just commented the code out and thought that I
should give this mailing list a try. :)


Is there a way to convert a pdf page to an image without the images? I
couldn't figure out how to do it.


"Bad question":
Is there a Java 5 version? :)


Thanks for your help so far,
Daniel




2008/11/11 <An...@rwe.com>

> > 1. PageWriter:
> > Every time the convertToImage method is called there will be a new
> PageDrawer created
> > and the parent class PDFStreamEngine loads the resource bundles.
> >
> > Why not use some kind of a cache for loading the bundles? (Load resources
> just ones)
> It's a little bit complicated. As I understood, the resources contains the
> operator-class-mapping. Every entry will be linked to the PDFStreamEngine as
> some sort of callback-mechanism. Consequently every instance of a
> PDFStreamEngine needs its own mapping.
>
> > Why not use one PageWriter for the whole document? (Just share it)
> This could be done easily.
>
> > 2. PageWriter:
> > Every time the convertToImage method is called there will be a new
> BufferedImage+Graphics.
> > (but not so expensive like Graphics2D.scale() ;))
> >
> > Why not reuse a BufferedImage for all pages? It is faster to call
> > Graphics2d.fillRect() than creating a new one.
> This is a litte bit complicated too. Within a pdf-document all pages may
> have a different size and orientation. So, thinking about oo-programming,
> only the class PDPage "knows" everything about the page, consequently PDPage
> has to provide the conversion to a image.
>
> A this point the main questions is: did you ever implement your suggestions
> to compare the performance? Or, are these thoughts theoretical? Did you
> perhaps use some kind of a profiler?
>
> I'm using this feature (convertToImage, printing) as well, and I don't have
> any serious performance-issues. But if there is some potential to speed up
> pdfbox, let's try to do it. But we have to compare the costs against the
> profit.
>
>
> Greetings from rainy Essen just in the middle of the Ruhrpott ;-)
> Andreas
>
> 2008/11/11 <An...@rwe.com>
>
> > Hi Daniel,
> >
> > don't hesitate, I guess your suggestions are welcome wether they will
> > included or not.
> >
> >
> > Andreas
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
> > Gesendet: Dienstag, 11. November 2008 12:49
> > An: pdfbox-users@incubator.apache.org
> > Betreff: Performance Tuning
> >
> >
> > Hi there,
> > first I have to say:"Good job." It's really a nice project.
> >
> >
> > I'm using the PDFBox for transforming PDF to Image. But I have some
> > performance issues, so I had a look at the source code. I saw several
> > points where I could save time. ;) Are you interested in them? I would
> > prefer that we could discuss about this, because maybe it is as
> > designed and my steps are dangerous.
> >
> >
> >
> > Best Regards,
> > Daniel
> >
> > ----------------------------------------------------------------
> > - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan
> > Niehusmann -
> > - Sitz der Gesellschaft: Dortmund -
> > - Eingetragen beim Amtsgericht Dortmund -
> > - Handelsregister-Nr. HR B 21222 -
> > - USt.-IdNr. DE 2588 96 719 -
> >
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
> ----------------------------------------------------------------
> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender),
> Stefan Niehusmann -
> - Sitz der Gesellschaft: Dortmund -
> - Eingetragen beim Amtsgericht Dortmund -
> - Handelsregister-Nr. HR B 21222 -
> - USt.-IdNr. DE 2588 96 719 -
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

AW: Performance Tuning

Posted by An...@rwe.com.
> 1. PageWriter:
> Every time the convertToImage method is called there will be a new PageDrawer created 
> and the parent class PDFStreamEngine loads the resource bundles.
>
> Why not use some kind of a cache for loading the bundles? (Load resources just ones) 
It's a little bit complicated. As I understood, the resources contains the operator-class-mapping. Every entry will be linked to the PDFStreamEngine as some sort of callback-mechanism. Consequently every instance of a PDFStreamEngine needs its own mapping.

> Why not use one PageWriter for the whole document? (Just share it)
This could be done easily. 

> 2. PageWriter:
> Every time the convertToImage method is called there will be a new BufferedImage+Graphics.
> (but not so expensive like Graphics2D.scale() ;))
>
> Why not reuse a BufferedImage for all pages? It is faster to call
> Graphics2d.fillRect() than creating a new one.
This is a litte bit complicated too. Within a pdf-document all pages may have a different size and orientation. So, thinking about oo-programming, only the class PDPage "knows" everything about the page, consequently PDPage has to provide the conversion to a image.

A this point the main questions is: did you ever implement your suggestions to compare the performance? Or, are these thoughts theoretical? Did you perhaps use some kind of a profiler?

I'm using this feature (convertToImage, printing) as well, and I don't have any serious performance-issues. But if there is some potential to speed up pdfbox, let's try to do it. But we have to compare the costs against the profit.


Greetings from rainy Essen just in the middle of the Ruhrpott ;-)
Andreas

2008/11/11 <An...@rwe.com>

> Hi Daniel,
>
> don't hesitate, I guess your suggestions are welcome wether they will 
> included or not.
>
>
> Andreas
>
>
> -----Ursprüngliche Nachricht-----
> Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
> Gesendet: Dienstag, 11. November 2008 12:49
> An: pdfbox-users@incubator.apache.org
> Betreff: Performance Tuning
>
>
> Hi there,
> first I have to say:"Good job." It's really a nice project.
>
>
> I'm using the PDFBox for transforming PDF to Image. But I have some 
> performance issues, so I had a look at the source code. I saw several 
> points where I could save time. ;) Are you interested in them? I would 
> prefer that we could discuss about this, because maybe it is as 
> designed and my steps are dangerous.
>
>
>
> Best Regards,
> Daniel
>
> ----------------------------------------------------------------
> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), Stefan 
> Niehusmann -
> - Sitz der Gesellschaft: Dortmund -
> - Eingetragen beim Amtsgericht Dortmund -
> - Handelsregister-Nr. HR B 21222 -
> - USt.-IdNr. DE 2588 96 719 -
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), 
Stefan Niehusmann - 
- Sitz der Gesellschaft: Dortmund - 
- Eingetragen beim Amtsgericht Dortmund - 
- Handelsregister-Nr. HR B 21222 - 
- USt.-IdNr. DE 2588 96 719 - 

Re: Performance Tuning

Posted by Daniel Manzke <da...@googlemail.com>.
Hi Andreas,
I will give it a try with my english. ;)


I analyzed a little bit the PDPage class. Due the fact that I want to
transform the page to an image, I often call PDPage.convertToImage(). So
this is my starting point for analyzing.

1. PageWriter:
Every time the convertToImage method is called there will be a new
PageDrawer created and the parent class PDFStreamEngine loads the resource
bundles.

Why not use some kind of a cache for loading the bundles? (Load resources
just ones)
Why not use one PageWriter for the whole document? (Just share it)

2. PageWriter:
Every time the convertToImage method is called there will be a
new BufferedImage+Graphics. (but not so expensive like Graphics2D.scale()
;))

Why not reuse a BufferedImage for all pages? It is faster to call
Graphics2d.fillRect() than creating a new one.



This is all until now. ;)


Best regards from Berlin,
Daniel

2008/11/11 <An...@rwe.com>

> Hi Daniel,
>
> don't hesitate, I guess your suggestions are welcome wether they will
> included or not.
>
>
> Andreas
>
>
> -----Ursprüngliche Nachricht-----
> Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com]
> Gesendet: Dienstag, 11. November 2008 12:49
> An: pdfbox-users@incubator.apache.org
> Betreff: Performance Tuning
>
>
> Hi there,
> first I have to say:"Good job." It's really a nice project.
>
>
> I'm using the PDFBox for transforming PDF to Image. But I have some
> performance issues, so I had a look at the source code. I saw several points
> where I could save time. ;) Are you interested in them? I would prefer that
> we could discuss about this, because maybe it is as designed and my steps
> are dangerous.
>
>
>
> Best Regards,
> Daniel
>
> ----------------------------------------------------------------
> - Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender),
> Stefan Niehusmann -
> - Sitz der Gesellschaft: Dortmund -
> - Eingetragen beim Amtsgericht Dortmund -
> - Handelsregister-Nr. HR B 21222 -
> - USt.-IdNr. DE 2588 96 719 -
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

AW: Performance Tuning

Posted by An...@rwe.com.
Hi Daniel,

don't hesitate, I guess your suggestions are welcome wether they will included or not.


Andreas


-----Ursprüngliche Nachricht-----
Von: Daniel Manzke [mailto:daniel.manzke@googlemail.com] 
Gesendet: Dienstag, 11. November 2008 12:49
An: pdfbox-users@incubator.apache.org
Betreff: Performance Tuning


Hi there,
first I have to say:"Good job." It's really a nice project.


I'm using the PDFBox for transforming PDF to Image. But I have some performance issues, so I had a look at the source code. I saw several points where I could save time. ;) Are you interested in them? I would prefer that we could discuss about this, because maybe it is as designed and my steps are dangerous.



Best Regards,
Daniel

----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), 
Stefan Niehusmann - 
- Sitz der Gesellschaft: Dortmund - 
- Eingetragen beim Amtsgericht Dortmund - 
- Handelsregister-Nr. HR B 21222 - 
- USt.-IdNr. DE 2588 96 719 -