You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Ben Litchfield <be...@benlitchfield.com> on 2006/03/09 21:43:22 UTC

Combine FOP & PDFBox efforts?

Hello all,

I am the main developer of PDFBox, an open source(BSD) PDF library.

FOP contains PDF library functionality(specifically classes in 
org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do 
very similar things they contain a lot of overlapping code, but the pdf 
package in FOP has some features that PDFBox does not and PDFBox has 
some features that the FOP pdf package does not.

I propose that classes in FOP's package be 'merged' into the PDFBox 
library and FOP utilize PDFBox for PDF functionality.

I think we should do this for a variety of reasons;
-PDFBox & FOP benefit by gaining functionality
-PDFBox & FOP benefit by having a larger user base, which means code is 
used more, tested more, contributed to more
-The entire community benefits by having higher quality PDF components 
available
-There are several projects that currently take FOP output and perform 
post processing with PDFBox, this could be optimized if FOP used PDFBox 
as its core
-Future core PDF development efforts will no longer be duplicated 
between these two projects

I wanted to gauge interest from FOP developers and start to think about 
how we can make this work.  What do you guys think?

Ben Litchfield
http://www.pdfbox.org/



Re: Combine FOP & PDFBox efforts?

Posted by Chris Bowditch <bo...@hotmail.com>.
Ben Litchfield wrote:

<snip/>

> I propose that classes in FOP's package be 'merged' into the PDFBox 
> library and FOP utilize PDFBox for PDF functionality.
> 
> I think we should do this for a variety of reasons;
> -PDFBox & FOP benefit by gaining functionality
> -PDFBox & FOP benefit by having a larger user base, which means code is 
> used more, tested more, contributed to more
> -The entire community benefits by having higher quality PDF components 
> available
> -There are several projects that currently take FOP output and perform 
> post processing with PDFBox, this could be optimized if FOP used PDFBox 
> as its core
> -Future core PDF development efforts will no longer be duplicated 
> between these two projects

Thanks for coming forward with this proposal. It certainly looks like 
both projects have a lot to gain from such a merge.

The one who really needs to comment on this proposal is Jeremias as he 
had plans to take the PDF library out of FOP's code base and make it a 
separate library in XML Graphics Commons project. It could be that when 
we do this, we also merge with PDF Box Libraries.

I believe Jeremias is unwell at the moment, so he might not be able to 
comment for a few days. Jeremias is also well versed in the ASF position 
on licensing.

> 
> I wanted to gauge interest from FOP developers and start to think about 
> how we can make this work.  What do you guys think?

In short its a good idea :)

Chris



Re: Combine FOP & PDFBox efforts?

Posted by Manuel Mall <mm...@arcus.com.au>.
> I spent a little time on the Apache Licensing page, and didn't find
> anywhere that said it was compatible (I'm not saying it isn't
> compatible, just that I didn't see anything that said it was... in
> the 5 minutes I looked). As for the rest of the licensing stuff, I
> don't know. But the answer may be on the Apache Licensing page[3]
> somewhere.
>

According to http://wiki.apache.org/jakarta/LicenceIssues (which is NOT
official ASF position) BSD is OK. See also:
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200512.mbox/browser.

Manuel

<snip/>


Re: Combine FOP & PDFBox efforts?

Posted by Clay Leeds <we...@mac.com>.
I spent a little time on the Apache Licensing page, and didn't find  
anywhere that said it was compatible (I'm not saying it isn't  
compatible, just that I didn't see anything that said it was... in  
the 5 minutes I looked). As for the rest of the licensing stuff, I  
don't know. But the answer may be on the Apache Licensing page[3]  
somewhere.

In any case, I suspect other FOP-dev codies will have more to say  
about the whole prospect of working together. I just thought I'd get  
the ball rolling a bit.

Clay

On Mar 9, 2006, at 7:39 PM, Ben Litchfield wrote:
> Hi Clay,
>
> I am glad to hear this sounds like a possibility.
>
> PDFBox is currently licensed under the BSD license.  I did not  
> initially envision a change in licensing, but I am open to  
> possibilities if necessary.  Is there a reason it would need to  
> change?
>
> It is my understanding that Apache projects can utilize projects  
> that are BSD licensed.  Is it possible for the existing FOP pdf  
> classes to become part of PDFBox under the BSD license?
>
> Ben
>
>
> Clay Leeds wrote:
>> On Mar 9, 2006, at 12:43 PM, Ben Litchfield wrote:
>>> Hello all,
>>>
>>> I am the main developer of PDFBox, an open source(BSD) PDF library.
>>>
>>> FOP contains PDF library functionality(specifically classes in
>>> org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do
>>> very similar things they contain a lot of overlapping code, but  
>>> the pdf
>>> package in FOP has some features that PDFBox does not and PDFBox has
>>> some features that the FOP pdf package does not.
>>>
>>> I propose that classes in FOP's package be 'merged' into the PDFBox
>>> library and FOP utilize PDFBox for PDF functionality.
>>
>> <snip>
>>
>>> Ben Litchfield
>>> http://www.pdfbox.org/
>>
>> Thank you for your interest, Ben. Although I don't speak for  
>> everyone, it does look intriguing to me. You may want to clarify  
>> how you envision PDFBox will be licensed (would this be a software  
>> license grant[1]?). I don't know the details on the BSD license.  
>> Also, I assume you would provide a software license grant and fill  
>> out a CLA[2].
>>
>> [1]
>> http://www.apache.org/licenses/#grants
>> [2]
>> http://www.apache.org/licenses/#clas

[3]
http://www.apache.org/foundation/licence-FAQ.html

Clay Leeds
webmaestro@mac.com

My religion is simple. My religion is kindness.
-- HH Dalai Lama of Tibet




Re: Combine FOP & PDFBox efforts?

Posted by Ben Litchfield <be...@benlitchfield.com>.
Hi Clay,

I am glad to hear this sounds like a possibility.

PDFBox is currently licensed under the BSD license.  I did not initially 
envision a change in licensing, but I am open to possibilities if 
necessary.  Is there a reason it would need to change?

It is my understanding that Apache projects can utilize projects that 
are BSD licensed.  Is it possible for the existing FOP pdf classes to 
become part of PDFBox under the BSD license?

Ben


Clay Leeds wrote:
> On Mar 9, 2006, at 12:43 PM, Ben Litchfield wrote:
>> Hello all,
>>
>> I am the main developer of PDFBox, an open source(BSD) PDF library.
>>
>> FOP contains PDF library functionality(specifically classes in
>> org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do
>> very similar things they contain a lot of overlapping code, but the pdf
>> package in FOP has some features that PDFBox does not and PDFBox has
>> some features that the FOP pdf package does not.
>>
>> I propose that classes in FOP's package be 'merged' into the PDFBox
>> library and FOP utilize PDFBox for PDF functionality.
>
> <snip>
>
>> Ben Litchfield
>> http://www.pdfbox.org/
>
> Thank you for your interest, Ben. Although I don't speak for everyone, 
> it does look intriguing to me. You may want to clarify how you 
> envision PDFBox will be licensed (would this be a software license 
> grant[1]?). I don't know the details on the BSD license. Also, I 
> assume you would provide a software license grant and fill out a CLA[2].
>
> [1]
> http://www.apache.org/licenses/#grants
> [2]
> http://www.apache.org/licenses/#clas
>
> Clay Leeds
> webmaestro@mac.com
>
> My religion is simple. My religion is kindness.
> -- HH Dalai Lama of Tibet
>
>


Re: Combine FOP & PDFBox efforts?

Posted by Clay Leeds <we...@mac.com>.
On Mar 9, 2006, at 12:43 PM, Ben Litchfield wrote:
> Hello all,
>
> I am the main developer of PDFBox, an open source(BSD) PDF library.
>
> FOP contains PDF library functionality(specifically classes in
> org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do
> very similar things they contain a lot of overlapping code, but the  
> pdf
> package in FOP has some features that PDFBox does not and PDFBox has
> some features that the FOP pdf package does not.
>
> I propose that classes in FOP's package be 'merged' into the PDFBox
> library and FOP utilize PDFBox for PDF functionality.

<snip>

> Ben Litchfield
> http://www.pdfbox.org/

Thank you for your interest, Ben. Although I don't speak for  
everyone, it does look intriguing to me. You may want to clarify how  
you envision PDFBox will be licensed (would this be a software  
license grant[1]?). I don't know the details on the BSD license.  
Also, I assume you would provide a software license grant and fill  
out a CLA[2].

[1]
http://www.apache.org/licenses/#grants
[2]
http://www.apache.org/licenses/#clas

Clay Leeds
webmaestro@mac.com

My religion is simple. My religion is kindness.
-- HH Dalai Lama of Tibet



Re: Combine FOP & PDFBox efforts?

Posted by Christian Geisert <ch...@isu-gmbh.de>.
Ben Litchfield schrieb:
> Hello all,

Hi Ben,

> I am the main developer of PDFBox, an open source(BSD) PDF library.

How many other developers are working on PDFBox?

[..]

> I propose that classes in FOP's package be 'merged' into the PDFBox 
> library and FOP utilize PDFBox for PDF functionality.

Are you proposing to do this on SourceForge or here at the ASF?

-- 
Christian

Re: Combine FOP & PDFBox efforts?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
I believe we're not talking about the same aspect. I'm not saying the
having support for parsed PDF in FOP is off-topic. I'm very much for
having that as resources allow. I was talking about adopting PDFBox.
PDFBox itself is a project big enough to support its own community.
Integrating it into FOP would kill its visibility. Integrating it into
XML Graphics means stretching the project's mission quite a bit. It
would have to be a separate subproject (same level as Batik, FOP and
Commons), otherwise its visibility is not good enough. We would hurt
PDFBox that way.

If there's enough energy coming from the PDFBox community (not just Ben),
we could help it into the ASF as an Incubator project destined for its
own TLP. But that's not a decision to be taken lightly. But it would be
a cool thing. I simply don't think XML Graphics is the place for PDFBox.

On 13.03.2006 10:44:09 Chris Bowditch wrote:
> Jeremias Maerki wrote:
> 
> <snip/>
> 
> > * Adopting PDFBox into the ASF is certainly an option if the people
> > involved in PDFBox really want that. A full PDF library with parsing and
> > rendering support might go beyond the XML Graphics' project boundaries,
> > however. It might need to go into a separate project. And that would
> > certainly be a big step which would need a lot of energy.
> 
> Jeremias, I'm not sure I agree with this comment. We get a lot of 
> customers asking for an XSL-FO solution that can include a static PDF as 
> the last page or similar. RenderX already offers the ability to include 
> a PDF in a fo:external-graphic. If FOP had the ability to parse a PDF, 
> then this feature would be a possibility for FOP *g*
> 
> Chris



Jeremias Maerki


Re: Combine FOP & PDFBox efforts?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
(Sorry for my late answer. Been away the last two days.)

On 13.03.2006 23:52:40 Ben Litchfield wrote:
> Chris,
> 
> 
> > I don't think FOP should step up to a minimum of 1.4. Just last week 
> a 
> > user was saying on the user mailing list that needed FOP to run on 
> JDK 
> > 1.2. 
> 
> 
> You have valid concerns, I will revisit exactly what parts of PDFBox 
> require 1.4.  I suspect it is only small sections and potentially 
> sections not required for FOP functionality.  For example, converting a 
> PDF to an image requires 1.4, but that is not functionality that would 
> be used by FOP.  It should not be too difficult to isolate these 
> sections.  
> 
> My only other comment(and I am just ranting) on this issue is that 
> staying compatible with 1.3 requires effort, which diverts effort from 
> added features or fixing bugs.  Staying on the bleeding edge also takes 
> a lot of effort, but 1.4 has been around for over 4 years, we need to 
> cutover at some point.

I've just downloaded the source code for pdfbox and fontbox from CVS and went
through it: 

There are a few string routines which are easily made available in JDK
1.3. Same goes for W3C DOM and JAXP. The Color constants simply have to
be written in lower case. Remains the use of regular expressions which
is only done in utility classes which would not be needed by FOP.

As you said, ImageIO is probably the biggest problem for JDK 1.3
compatibility. That would take some work, but not too much I think,
since some of the conversion functions could be taken out of the basic
PD model classes and put into helper classes/functions. This can make
the basic PDFBox JDK 1.3 compatible (with some minimal restrictions) and
useful for FOP without too much effort, probably without even having to
break backwards-compatibility.

> 
> > As Jeremias already mentioned FORay Font is a standalone Font library 
> > that has evolved from FOP 0.20.5. 
> 
> 
> I just joined the foray developer list and hope to start helping.  I 
> don't fully understand how the Foray font package is changing to 
> support FOP, but that is off topic so I'll start a new discussion 
> thread on that list.

I've seen it. It's good that you did that. We all depend on fonts.


Jeremias Maerki


Re: Combine FOP & PDFBox efforts?

Posted by Web Maestro Clay <th...@gmail.com>.
On Mar 13, 2006, at 1:39 AM, Chris Bowditch wrote:
> Ben Litchfield wrote:
>> Jeremias,
>> I'll start by answering your questions
>> 1)What is minimum JDK required by PDFBox?
>> PDFBox currently requires 1.4, because it uses ImageIO and a  
>> couple other things that make development much easier.  PDFBox was  
>> compatible with 1.3 for a long time, but I made a decision that  
>> sticking with 1.3 would cost too much in development time versus  
>> using existing stuff in 1.4.  In addition 1.3 is now two major  
>> versions old and in the EOL phase.  As this effort will take some  
>> time before it could be released would it be reasonable to move  
>> the minimum requirement up to 1.4 for Batik and FOP at that time?
>
> I don't think FOP should step up to a minimum of 1.4. Just last  
> week a user was saying on the user mailing list that needed FOP to  
> run on JDK 1.2. A lot of big corporates are slow to upgrade their  
> unix operating systems. And sometimes it is very difficult to get  
> later JDK's working on older Unix platforms. Now since FOP is used  
> a lot in batch processing, we should not be so quick to exclude  
> these big corporations with old Unix platforms from using FOP.

In the case of my former company, we could not upgrade our machine  
from the version of AIX (I think it was 4.x), and IBM 1.3.x was the  
most recent version of JRE available.

One 'option' available, is fop-0.20.5 (which works great!). As long  
as it remains available, that would be sufficient. <ducking>If we did  
go that route, it may make sense to fix a few bugs in fop-0.20.5.</ 
ducking>.

Clay Leeds
the.webmaestro@gmail.com

My religion is simple. My religion is kindness.
-- HH Dalai Lama of Tibet




Re: Combine FOP & PDFBox efforts?

Posted by Chris Bowditch <bo...@hotmail.com>.
Ben Litchfield wrote:

> Jeremias,
> 
> I'll start by answering your questions
> 
> 1)What is minimum JDK required by PDFBox?
> 
> PDFBox currently requires 1.4, because it uses ImageIO and a couple 
> other things that make development much easier.  PDFBox was compatible 
> with 1.3 for a long time, but I made a decision that sticking with 1.3 
> would cost too much in development time versus using existing stuff in 
> 1.4.  In addition 1.3 is now two major versions old and in the EOL 
> phase.  As this effort will take some time before it could be released 
> would it be reasonable to move the minimum requirement up to 1.4 for 
> Batik and FOP at that time?

I don't think FOP should step up to a minimum of 1.4. Just last week a 
user was saying on the user mailing list that needed FOP to run on JDK 
1.2. A lot of big corporates are slow to upgrade their unix operating 
systems. And sometimes it is very difficult to get later JDK's working 
on older Unix platforms. Now since FOP is used a lot in batch 
processing, we should not be so quick to exclude these big corporations 
with old Unix platforms from using FOP.

<snip/>

> Some additional comments,
> *After the 0.7.2 release, PDFBox split the font infrastructure into 
> another project, so aptly named FontBox.  No official version has been 
> released yet but the project was created and all font parsing logic was 
> separated from PDFBox.  As far as I can tell there is no open source 
> font library and for many of the same reasons we have discussed I 
> thought it would be better as a separate project.  It sounds like there 
> has already been some discussion on making a separate font library 
> project, I would be happy to collaborate on and donate what little font 
> parsing code I have to that project.  It only makes sense for 
> PDFBox/FOP/Batik/... to all use a single font library.  It is starting 
> to sound like a unified font system might be the first task.

As Jeremias already mentioned FORay Font is a standalone Font library 
that has evolved from FOP 0.20.5. Take a look at the FORay project for 
more details:

http://foray.sourceforge.net/

> 
> *I did not realize that other projects(Batik) were using FOP's pdf 
> library, again a separate PDF&Font library makes that cleaner.  As a 
> side note, PDFs can contain SVG graphics, so I eventually saw PDFBox 
> utilizing Batik, which makes things interesting :)

yes the dependencies between batik and FOP gets confusing at times :)

<snip/>

Chris



Re: Combine FOP & PDFBox efforts?

Posted by Vincent Hennebert <vi...@enseeiht.fr>.
Hi Ben, hi All,

I finally have some time to chime in, sorry for the delay. Thank you for
your interest in the font subsystem.

My goal is to adapt the FOrayFont library to Fop. The main advantage of
FOrayFont over the Fop code is its ability to directly parse font files,
whereas currently with Fop there is a two-step process: first convert
the font metrics into an xml file, then use it within Fop through a
configuration file. You can have the process in [1].

I've submitted a first patch in december [2], that was refused because
of unacceptable shortcomings of FOrayFont. The main reasons were:
* lack of a default config file;
* configuration too complicated.
You will find all the details in [3]. Since that I'm working with Victor
on FOrayFont's improvement. We have recently ended the design phase and
have agreed on a set of changes that I still have to apply (you will
find the discussion on the FOray-dev mailing list archive from the last
two months. I'll add more on this on FOray-dev.). After that I believe
that the main shortcomings will be corrected and that an updated patch
can be submitted.

PDFBox is pretty independant of my work. I currently rely entirely on
the Fop PDF library for PDF outputs, and I'm only adapting necessary
things to make it use FOrayFont. FOrayFont is a low-level library that
tries to be independent of any output format, and thus may be used by
whatever renderer. So if PDFBox were to be used by Fop, for me it would
just mean that I would have to adapt PDFBox instead of the Fop library.

For FontBox this is different, and I think there is a possibility to
share resources in this area. I'll put more details on FOray-dev, but in
short it would be great if we could achieve the following:
* merge the best of FontBox and FOrayFont to obtain a good font library;
* agree on a common interface (i.e., an API) for the font library, that
   would be used conjointly by Fop, PDFBox and FOray;
* adapt PDFBox to make it use this resulting library;
* make it work with Fop in some manner.

I would like to work with you on the two first points. As you have
probably already noticed the discussion will be mainly held in the FOray
area. We will chime in here for Fop-specific things and to notify Fop
devs of advancements of the adaptation work.

I'm glad to see that there is place for collaboration. I'm sure that we
will be able to achieve Great Things ;-)

Cheers,
Vincent


Current way to configure fonts in Fop:
[1] http://xmlgraphics.apache.org/fop/trunk/fonts.html

Patch for the adaptation of FOrayFont to Fop (now outdated):
[2] http://issues.apache.org/bugzilla/show_bug.cgi?id=35948

Reasons of the patch refusal:
[3] 
http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-dev/200512.mbox/browser



Ben Litchfield a écrit :
> Jeremias,
> 
> I'll start by answering your questions
> 
> 1)What is minimum JDK required by PDFBox?
> 
> PDFBox currently requires 1.4, because it uses ImageIO and a couple 
> other things that make development much easier.  PDFBox was compatible 
> with 1.3 for a long time, but I made a decision that sticking with 1.3 
> would cost too much in development time versus using existing stuff in 
> 1.4.  In addition 1.3 is now two major versions old and in the EOL 
> phase.  As this effort will take some time before it could be released 
> would it be reasonable to move the minimum requirement up to 1.4 for 
> Batik and FOP at that time?
> 
> 2)Does PDFBox require log4j?
> 
> PDFBox used to be dependent on log4j, 0.7.2 has an optional dependency, 
> the soon to be released 0.7.3 version will not use log4j at all.
> Currently PDFBox's only dependency is FontBox(see comments below), 
> although bouncy castle will soon become an optional dependency for 
> certificate based encryption and rhino(looks like Batik uses this as 
> well) will also be an optional dependency for Javascript execution.
> 
> 
> Some additional comments,
> *After the 0.7.2 release, PDFBox split the font infrastructure into 
> another project, so aptly named FontBox.  No official version has been 
> released yet but the project was created and all font parsing logic was 
> separated from PDFBox.  As far as I can tell there is no open source 
> font library and for many of the same reasons we have discussed I 
> thought it would be better as a separate project.  It sounds like there 
> has already been some discussion on making a separate font library 
> project, I would be happy to collaborate on and donate what little font 
> parsing code I have to that project.  It only makes sense for 
> PDFBox/FOP/Batik/... to all use a single font library.  It is starting 
> to sound like a unified font system might be the first task.
> 
> *I did not realize that other projects(Batik) were using FOP's pdf 
> library, again a separate PDF&Font library makes that cleaner.  As a 
> side note, PDFs can contain SVG graphics, so I eventually saw PDFBox 
> utilizing Batik, which makes things interesting :)
> 
> *If bringing PDFBox into ASF is what is necessary to make this work than 
> I am willing to do that.  As you say, this requires a fair amount of 
> energy, so "just because" is not a good enough reason for me to to 
> expend the energy.
> 
> 
> It sounds like the first thing we need to do is get the font system 
> working.  I also like Jeremias' idea of experimenting with a copy of the 
> PDFRenderer, low risk and little disruption to ongoing work.
> 
> At a high level this sounds reasonable to me
> 1)Separate font system
> 2)PDFBox and FOP are independently updated to use a common font system
> 3)A copy of the PDF renderer is created and updated to utilize PDFBox
> 4)Go from there
> 
> No matter what is decided, steps 1&2 are desired and are already in 
> progress.  I would like to help with the creation of the font sub system 
> because I would like PDFBox to use it.
> 
> 
> Ben

Re: Combine FOP & PDFBox efforts?

Posted by Ben Litchfield <be...@benlitchfield.com>.
Jeremias,

I'll start by answering your questions

1)What is minimum JDK required by PDFBox?

PDFBox currently requires 1.4, because it uses ImageIO and a couple 
other things that make development much easier.  PDFBox was compatible 
with 1.3 for a long time, but I made a decision that sticking with 1.3 
would cost too much in development time versus using existing stuff in 
1.4.  In addition 1.3 is now two major versions old and in the EOL 
phase.  As this effort will take some time before it could be released 
would it be reasonable to move the minimum requirement up to 1.4 for 
Batik and FOP at that time?

2)Does PDFBox require log4j?

PDFBox used to be dependent on log4j, 0.7.2 has an optional dependency, 
the soon to be released 0.7.3 version will not use log4j at all. 

Currently PDFBox's only dependency is FontBox(see comments below), 
although bouncy castle will soon become an optional dependency for 
certificate based encryption and rhino(looks like Batik uses this as 
well) will also be an optional dependency for Javascript execution.


Some additional comments,
*After the 0.7.2 release, PDFBox split the font infrastructure into 
another project, so aptly named FontBox.  No official version has been 
released yet but the project was created and all font parsing logic was 
separated from PDFBox.  As far as I can tell there is no open source 
font library and for many of the same reasons we have discussed I 
thought it would be better as a separate project.  It sounds like there 
has already been some discussion on making a separate font library 
project, I would be happy to collaborate on and donate what little font 
parsing code I have to that project.  It only makes sense for 
PDFBox/FOP/Batik/... to all use a single font library.  It is starting 
to sound like a unified font system might be the first task.

*I did not realize that other projects(Batik) were using FOP's pdf 
library, again a separate PDF&Font library makes that cleaner.  As a 
side note, PDFs can contain SVG graphics, so I eventually saw PDFBox 
utilizing Batik, which makes things interesting :)

*If bringing PDFBox into ASF is what is necessary to make this work than 
I am willing to do that.  As you say, this requires a fair amount of 
energy, so "just because" is not a good enough reason for me to to 
expend the energy.


It sounds like the first thing we need to do is get the font system 
working.  I also like Jeremias' idea of experimenting with a copy of the 
PDFRenderer, low risk and little disruption to ongoing work.

At a high level this sounds reasonable to me
1)Separate font system
2)PDFBox and FOP are independently updated to use a common font system
3)A copy of the PDF renderer is created and updated to utilize PDFBox
4)Go from there

No matter what is decided, steps 1&2 are desired and are already in 
progress.  I would like to help with the creation of the font sub system 
because I would like PDFBox to use it.


Ben


Jeremias Maerki wrote:
> Ben,
>
> thank you for speaking up. As Chris guessed right, I've been out of the
> fight for the last few days. Still recovering...
>
> Since I've discovered PDFBox I've always played with the thought that
> one day we might put our resources together. You'll see below why I
> personally haven't put any energy into it, yet.
>
> First of all, let me reassure everyone that the BSD license PDFBox uses
> would be totally fine for us (PMC members should know that if they read
> the mails on the PMC list *g*). Remember, the Apache license originally
> emerged from the BSD license. Those who are here for a long time now
> might remember that there was once a short discussion about switching to
> iText (mid-March 2002). I don't remember the exact reasons why this
> wasn't pursued but I think the license was one of the reasons. iText is
> dual-licensed (MPL (more or less ok) and LGPL (no go)). I guess the itch
> was too feeble, too, at that time. However, if I'm not mistaken one of
> the FOP devs wrote a private PDF Renderer implementation using iText.
>
> That said, I don't think we have a big itch today. I would like to list
> of few points (in addition to Ben's) to consider without saying +1 or -1
> to the whole idea at this time (I haven't made up my mind, yet):
>
> * PDFBox looks like a well-maintained and well-structured project. The
> license is very liberal. Activity seems to be good and it's not a new
> project. Well, it probably suffers from the same problem as FOP: Lack of
> confidence to jump over the version 1.0 barrier. ;-)
> * FOP's PDF library is supposed to move to XML Graphics Commons in order
> to build a clean dependency tree for Batik and FOP, since not only FOP
> uses the PDF library to produce PDF.
> * The Batik devs are very cautious about adopting an external dependancy.
> PDFBox would be such a thing. This means that working with PDFBox is not
> only a decision of the FOP subproject, but one for the whole XML
> Graphics project.
> * PDFBox has its own font infrastructure (font file parsers). Vincent
> Hennebert is still working with Victor Mote (of FOray) to improve
> FOrayFont and to prepare its integration/use in FOP so we profit from
> additional functionality. I think it would be important to make sure
> that the PDF library and the font subsystem remain as independant of
> each other as possible, i.e. it may be necessary to have multiple
> subclasses of basic PDF model objects to interface with the various font
> sources.
> * Switching to on an externally managed library means giving away a
> certain amount of control and freedom. Changes may need more energy and
> time. But moving the PDF library from FOP to XML Graphics Commons will
> already mean a step in this direction. Two projects will depend on it
> which means more coordination.
> * Adopting PDFBox into the ASF is certainly an option if the people
> involved in PDFBox really want that. A full PDF library with parsing and
> rendering support might go beyond the XML Graphics' project boundaries,
> however. It might need to go into a separate project. And that would
> certainly be a big step which would need a lot of energy.
> * Talking about energy: Resources in FOP and Batik are still sparse.
> Switching the PDF library is a rather big task and would need investment
> from both XML Graphics and PDFBox sides. It might produce diversion from
> other tasks. Could we get that together? I may be a little pessimistic,
> but I doubt it at this time. Just look at the font stuff. Vincent
> currently has to play lone rider at the moment because I simply don't
> have the time to even closely track what's going on. And noone else
> seems to have time or motivation to jump in.
> * An idea: We could simply start an experiment and create a copy of our
> PDFRenderer in the sandbox which is converted to use PDFBox as PDF
> backend. If it evolves enough, we can switch the main implementation one
> day, i.e. just let evolution decide.
> * Integrating PDFBox would be cool because it would allow inserting
> arbitrary existing PDF pages or using preproduced PDF pages as page
> backgrounds, stamps, watermarks, external-graphic objects.
> * There's probably more to add here, but my head's starting to pound
> again....
>
> Questions:
> - What's the minimal JDK version for PDFBox? FOP and Batik need to
> remain JDK 1.3.1 compatible for the time being.
> - I've seen something about Log4J. I hope this is an optional dependency.
> Is it? One task during the migration of FOP's PDF library to XML
> Graphics Commons is to remove the dependency on JCL. That was a wish
> coming from Batik. I assume the same would apply to any other PDF
> library we'd use.
>
> On 09.03.2006 21:43:22 Ben Litchfield wrote:
>   
>> Hello all,
>>
>> I am the main developer of PDFBox, an open source(BSD) PDF library.
>>
>> FOP contains PDF library functionality(specifically classes in 
>> org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do 
>> very similar things they contain a lot of overlapping code, but the pdf 
>> package in FOP has some features that PDFBox does not and PDFBox has 
>> some features that the FOP pdf package does not.
>>
>> I propose that classes in FOP's package be 'merged' into the PDFBox 
>> library and FOP utilize PDFBox for PDF functionality.
>>
>> I think we should do this for a variety of reasons;
>> -PDFBox & FOP benefit by gaining functionality
>> -PDFBox & FOP benefit by having a larger user base, which means code is 
>> used more, tested more, contributed to more
>> -The entire community benefits by having higher quality PDF components 
>> available
>> -There are several projects that currently take FOP output and perform 
>> post processing with PDFBox, this could be optimized if FOP used PDFBox 
>> as its core
>> -Future core PDF development efforts will no longer be duplicated 
>> between these two projects
>>
>> I wanted to gauge interest from FOP developers and start to think about 
>> how we can make this work.  What do you guys think?
>>
>> Ben Litchfield
>> http://www.pdfbox.org/
>>     
>
>
>
> Jeremias Maerki
>
>   


Re: Combine FOP & PDFBox efforts?

Posted by Chris Bowditch <bo...@hotmail.com>.
Jeremias Maerki wrote:

<snip/>

> * Adopting PDFBox into the ASF is certainly an option if the people
> involved in PDFBox really want that. A full PDF library with parsing and
> rendering support might go beyond the XML Graphics' project boundaries,
> however. It might need to go into a separate project. And that would
> certainly be a big step which would need a lot of energy.

Jeremias, I'm not sure I agree with this comment. We get a lot of 
customers asking for an XSL-FO solution that can include a static PDF as 
the last page or similar. RenderX already offers the ability to include 
a PDF in a fo:external-graphic. If FOP had the ability to parse a PDF, 
then this feature would be a possibility for FOP *g*

Chris



Re: Combine FOP & PDFBox efforts?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Ben,

thank you for speaking up. As Chris guessed right, I've been out of the
fight for the last few days. Still recovering...

Since I've discovered PDFBox I've always played with the thought that
one day we might put our resources together. You'll see below why I
personally haven't put any energy into it, yet.

First of all, let me reassure everyone that the BSD license PDFBox uses
would be totally fine for us (PMC members should know that if they read
the mails on the PMC list *g*). Remember, the Apache license originally
emerged from the BSD license. Those who are here for a long time now
might remember that there was once a short discussion about switching to
iText (mid-March 2002). I don't remember the exact reasons why this
wasn't pursued but I think the license was one of the reasons. iText is
dual-licensed (MPL (more or less ok) and LGPL (no go)). I guess the itch
was too feeble, too, at that time. However, if I'm not mistaken one of
the FOP devs wrote a private PDF Renderer implementation using iText.

That said, I don't think we have a big itch today. I would like to list
of few points (in addition to Ben's) to consider without saying +1 or -1
to the whole idea at this time (I haven't made up my mind, yet):

* PDFBox looks like a well-maintained and well-structured project. The
license is very liberal. Activity seems to be good and it's not a new
project. Well, it probably suffers from the same problem as FOP: Lack of
confidence to jump over the version 1.0 barrier. ;-)
* FOP's PDF library is supposed to move to XML Graphics Commons in order
to build a clean dependency tree for Batik and FOP, since not only FOP
uses the PDF library to produce PDF.
* The Batik devs are very cautious about adopting an external dependancy.
PDFBox would be such a thing. This means that working with PDFBox is not
only a decision of the FOP subproject, but one for the whole XML
Graphics project.
* PDFBox has its own font infrastructure (font file parsers). Vincent
Hennebert is still working with Victor Mote (of FOray) to improve
FOrayFont and to prepare its integration/use in FOP so we profit from
additional functionality. I think it would be important to make sure
that the PDF library and the font subsystem remain as independant of
each other as possible, i.e. it may be necessary to have multiple
subclasses of basic PDF model objects to interface with the various font
sources.
* Switching to on an externally managed library means giving away a
certain amount of control and freedom. Changes may need more energy and
time. But moving the PDF library from FOP to XML Graphics Commons will
already mean a step in this direction. Two projects will depend on it
which means more coordination.
* Adopting PDFBox into the ASF is certainly an option if the people
involved in PDFBox really want that. A full PDF library with parsing and
rendering support might go beyond the XML Graphics' project boundaries,
however. It might need to go into a separate project. And that would
certainly be a big step which would need a lot of energy.
* Talking about energy: Resources in FOP and Batik are still sparse.
Switching the PDF library is a rather big task and would need investment
from both XML Graphics and PDFBox sides. It might produce diversion from
other tasks. Could we get that together? I may be a little pessimistic,
but I doubt it at this time. Just look at the font stuff. Vincent
currently has to play lone rider at the moment because I simply don't
have the time to even closely track what's going on. And noone else
seems to have time or motivation to jump in.
* An idea: We could simply start an experiment and create a copy of our
PDFRenderer in the sandbox which is converted to use PDFBox as PDF
backend. If it evolves enough, we can switch the main implementation one
day, i.e. just let evolution decide.
* Integrating PDFBox would be cool because it would allow inserting
arbitrary existing PDF pages or using preproduced PDF pages as page
backgrounds, stamps, watermarks, external-graphic objects.
* There's probably more to add here, but my head's starting to pound
again....

Questions:
- What's the minimal JDK version for PDFBox? FOP and Batik need to
remain JDK 1.3.1 compatible for the time being.
- I've seen something about Log4J. I hope this is an optional dependency.
Is it? One task during the migration of FOP's PDF library to XML
Graphics Commons is to remove the dependency on JCL. That was a wish
coming from Batik. I assume the same would apply to any other PDF
library we'd use.

On 09.03.2006 21:43:22 Ben Litchfield wrote:
> Hello all,
> 
> I am the main developer of PDFBox, an open source(BSD) PDF library.
> 
> FOP contains PDF library functionality(specifically classes in 
> org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do 
> very similar things they contain a lot of overlapping code, but the pdf 
> package in FOP has some features that PDFBox does not and PDFBox has 
> some features that the FOP pdf package does not.
> 
> I propose that classes in FOP's package be 'merged' into the PDFBox 
> library and FOP utilize PDFBox for PDF functionality.
> 
> I think we should do this for a variety of reasons;
> -PDFBox & FOP benefit by gaining functionality
> -PDFBox & FOP benefit by having a larger user base, which means code is 
> used more, tested more, contributed to more
> -The entire community benefits by having higher quality PDF components 
> available
> -There are several projects that currently take FOP output and perform 
> post processing with PDFBox, this could be optimized if FOP used PDFBox 
> as its core
> -Future core PDF development efforts will no longer be duplicated 
> between these two projects
> 
> I wanted to gauge interest from FOP developers and start to think about 
> how we can make this work.  What do you guys think?
> 
> Ben Litchfield
> http://www.pdfbox.org/



Jeremias Maerki