You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Maruan Sahyoun <sa...@fileaffairs.de> on 2013/03/08 10:52:52 UTC

Handling page imports

Hi,

currently there are several areas in pdfbox where pages are imported from pdfs and reused to form new content e.g. Overlay, OverlayPDF, PDFMerger, PDFSplit. Some of these do have their own ways to handle the actual import some do reuse utility classes. For overlay purposes we need an imported page as xObject for splitting that's not necessary.

As I do not have a complete overview about the lib would it make sense to come up with something like a PageManager to handle these tasks e.g. PageManager.importPage(PDPage page), PageManager.importPage(PDDocument pdDocument, int pageNumber) …  or is that not needed? Is a call to PDage page.getContents() reliable to get the content stream or does it have to be done by iterating and copying the individual parts as has be done in OverlayPDF? Could that be enhanced? Shall we handle page imports always as xObjects?

Thanks for your feedback on these open questions.

Maruan Sahyoun

Re: Handling page imports

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 10.03.2013 14:25, schrieb Glen Peterson:
>> BTW I quickly looked at your contribution. You put a lot of effort into what was a completely missing part!
>
> Thanks for taking the time to look, and for the compliment - you just
> made my day!
>
>> PageManager I was talking about is more low level than yours which is more towards a LayoutManager
>
> Ooh, good feedback.  Since your email, I'm planning to rename it.
>
>> A higher level API like yours could then rely on the low level API.  There might be some overlap though.
>
> Yes, I could see it being completely separate, though there is a
> strong dependency.  The dependency made me think that it belongs with
> PDFBox, especially since the collection of features called iText
> includes layout-manager functionality.
>
> It occurs to me that you guys are getting ready for a release and
> might not want to consider adding a whole new feature until you start
> a new chunk of development.  Also, it really can be released
Correct, unfortunately it is to late and I don't want to postpone the release,
as a lot of people are waiting for it.

> completely separately from PDFBox and you are currently breaking
> PDFBox up into some smaller projects.  I'm thinking now of calling the
> project com.planbase.pdf.LayoutManager or some such thing and hosting
> it on GitHub under the Apache license.  That will let me track it in
> source control and make it easier for me to move forward with it
> without cluttering up your mailing list.  If people use it, it's just
> not that hard for them to have to change a few imports to reflect it
> being moved to a different project.
+1

> You guys know it exists, and you know I'm excited about incorporating
> it into PDFBox if you want it there.  But I think that for now, it's
> probably best to consider it a separate project until we are all ready
> to put the two together if we decide that's a good idea.  When I
> actually make the move, I'll remove the code from the JIRA issue and
> replace it with a link to the GitHub project.
I really appreciate your offer! There are a lot of people looking for such
features. I guess it would perfectly fit into our next major release 2.0. till
then you might wanna have a look at our license agreements [1]. Your
contribution would be a substantial change to our codebase and we would ask you
to sign a iCLA/CCLA. Feel free to ask if anything is unclear.


> Thanks!
>
> -Glen K. Peterson
>

Thanks again for your offer and your interest in PDFBox!

BR
Andreas Lehmkühler

[1] http://www.apache.org/licenses/#clas

Re: Handling page imports

Posted by Glen Peterson <gl...@organicdesign.org>.
> BTW I quickly looked at your contribution. You put a lot of effort into what was a completely missing part!

Thanks for taking the time to look, and for the compliment - you just
made my day!

> PageManager I was talking about is more low level than yours which is more towards a LayoutManager

Ooh, good feedback.  Since your email, I'm planning to rename it.

> A higher level API like yours could then rely on the low level API.  There might be some overlap though.

Yes, I could see it being completely separate, though there is a
strong dependency.  The dependency made me think that it belongs with
PDFBox, especially since the collection of features called iText
includes layout-manager functionality.

It occurs to me that you guys are getting ready for a release and
might not want to consider adding a whole new feature until you start
a new chunk of development.  Also, it really can be released
completely separately from PDFBox and you are currently breaking
PDFBox up into some smaller projects.  I'm thinking now of calling the
project com.planbase.pdf.LayoutManager or some such thing and hosting
it on GitHub under the Apache license.  That will let me track it in
source control and make it easier for me to move forward with it
without cluttering up your mailing list.  If people use it, it's just
not that hard for them to have to change a few imports to reflect it
being moved to a different project.

You guys know it exists, and you know I'm excited about incorporating
it into PDFBox if you want it there.  But I think that for now, it's
probably best to consider it a separate project until we are all ready
to put the two together if we decide that's a good idea.  When I
actually make the move, I'll remove the code from the JIRA issue and
replace it with a link to the GitHub project.

Thanks!

-Glen K. Peterson

Re: Handling page imports

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Glen,

thanks for your feedback. I was thinking in the lines of generalizing how to deal with page imports so the PageManager I was talking about is more low level than yours which is more towards a LayoutManager. If you look at Overlay.java, OverlayPDF.java …. all handle it slightly differently (as I was in some of our projects). It might also be possible to add functions to change the page order …. A higher level API like yours could then rely on the low level API. There might be some overlap though. BTW I quickly looked at your contribution. You put a lot of effort into what was a completely missing part!

With kind regards - Maruan

Am 08.03.2013 um 14:09 schrieb Glen Peterson <gl...@organicdesign.org>:

> The concept of a page-manager is a useful one, and it makes sense to
> me to group the functionality you suggest with the stuff I called a
> page manager (handles reusing images, line-breaking, and
> page-breaking).  A new level of abstraction (a page manager) is
> necessary in order to cache some things before writing them to the
> underlying stream (cache lines as the line-breaking is being
> calculated, cache pages as the page-breaking is being calculated).
> Here is the PageManager code I submitted last week.  It doesn't import
> pages from other PDFs, but if people decide to incorporate this code
> into PDFBox, then I think your functionality would belong on this same
> PageManager:
> https://issues.apache.org/jira/browse/PDFBOX-1527
> 
> On Fri, Mar 8, 2013 at 4:52 AM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
>> Hi,
>> 
>> currently there are several areas in pdfbox where pages are imported from pdfs and reused to form new content e.g. Overlay, OverlayPDF, PDFMerger, PDFSplit. Some of these do have their own ways to handle the actual import some do reuse utility classes. For overlay purposes we need an imported page as xObject for splitting that's not necessary.
>> 
>> As I do not have a complete overview about the lib would it make sense to come up with something like a PageManager to handle these tasks e.g. PageManager.importPage(PDPage page), PageManager.importPage(PDDocument pdDocument, int pageNumber) …  or is that not needed? Is a call to PDage page.getContents() reliable to get the content stream or does it have to be done by iterating and copying the individual parts as has be done in OverlayPDF? Could that be enhanced? Shall we handle page imports always as xObjects?
>> 
>> Thanks for your feedback on these open questions.
>> 
>> Maruan Sahyoun
> 
> 
> 
> --
> Glen K. Peterson
> (828) 393-0081


Re: Handling page imports

Posted by Glen Peterson <gl...@organicdesign.org>.
The concept of a page-manager is a useful one, and it makes sense to
me to group the functionality you suggest with the stuff I called a
page manager (handles reusing images, line-breaking, and
page-breaking).  A new level of abstraction (a page manager) is
necessary in order to cache some things before writing them to the
underlying stream (cache lines as the line-breaking is being
calculated, cache pages as the page-breaking is being calculated).
Here is the PageManager code I submitted last week.  It doesn't import
pages from other PDFs, but if people decide to incorporate this code
into PDFBox, then I think your functionality would belong on this same
PageManager:
https://issues.apache.org/jira/browse/PDFBOX-1527

On Fri, Mar 8, 2013 at 4:52 AM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:
> Hi,
>
> currently there are several areas in pdfbox where pages are imported from pdfs and reused to form new content e.g. Overlay, OverlayPDF, PDFMerger, PDFSplit. Some of these do have their own ways to handle the actual import some do reuse utility classes. For overlay purposes we need an imported page as xObject for splitting that's not necessary.
>
> As I do not have a complete overview about the lib would it make sense to come up with something like a PageManager to handle these tasks e.g. PageManager.importPage(PDPage page), PageManager.importPage(PDDocument pdDocument, int pageNumber) …  or is that not needed? Is a call to PDage page.getContents() reliable to get the content stream or does it have to be done by iterating and copying the individual parts as has be done in OverlayPDF? Could that be enhanced? Shall we handle page imports always as xObjects?
>
> Thanks for your feedback on these open questions.
>
> Maruan Sahyoun



--
Glen K. Peterson
(828) 393-0081