You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xmlgraphics.apache.org by Jeremias Maerki <de...@jeremias-maerki.ch> on 2007/11/15 09:04:52 UTC

Fw: [DISCUSS] PDFBox proposal

Yesterday, we've discussed a possible incubation of PDFBox at the ASF.
There are several projects that are interested in such a move. For us
here in the XML Graphics project, PDFBox is interesting due to its
parsing functionality. Our own PDF library doesn't have that
functionality and is instead optimized for writing PDF which PDFBox
isn't.

As you may know, I've implemented a FOP plug-in that allows embedding of
PDF in newly generated PDF documents through XSL-FO. Using the same PDF
library for both tasks would be beneficial in the long-term.

Please take a look at the incubation proposal (link below) we're
currently writing. I have some questions to the XML Graphics community
in this context:

- Should the XML Graphics PMC be the sponsoring entity? [1]
- Can anyone besides me imagine investing time/resources to help with
the incubation, teaching PDFBox additional tricks like we need them?
- Can we imagine PDFBox becoming a subproject of XML Graphics after
successful incubation? PDF is not really an XML technology but deals
with graphical output. Newer technologies like XPS (Microsoft's XML
paper specification) and Adobe's Mars are XML-based paged document
formats. Not that they play a big role in the market, yet.

[1] Makes sense if we have a strong interest in PDFBox. If it's just me,
then it doesn't make sense and we're going to find a different solution.

Please note: We have some functionality overlap between our PDF library
and PDFBox in any case. Examples:
- Writing PDF (org.apache.fop.pdf)
- Parsing fonts (org.apache.fop.fonts, org.apache.batik.svggen.font.table)
- Font conversion (org.apache.batik.svggen.font)
- XMP metadata (org.apache.xmlgraphics.xmp)
- Image loading (org.apache.fop.image, org.apache.batik.ext.awt.image.spi)

BTW, the above table shows some spots where we could actually discuss
better cooperation within XML Graphics, i.e. between Batik & FOP.

Thoughts?

Forwarded by Jeremias Maerki <de...@jeremias-maerki.ch>
----------------------- Original Message -----------------------
 From:    "Jukka Zitting" <ju...@gmail.com>
 To:      general@incubator.apache.org
 Date:    Thu, 15 Nov 2007 03:08:33 +0200
 Subject: [DISCUSS] PDFBox proposal
----

Hi,

Ben Litchfield, the author of the PDFBox library, has been working
with us at the ApacheCon preparing a proposal to bring PDFBox into the
Apache Incubator. See http://wiki.apache.org/incubator/PDFBoxProposal
for the current draft of the proposal.

Some of the details are yet to be worked out, but the general idea is
there. All comments and questions are welcome!

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

--------------------- Original Message Ends --------------------


Jeremias Maerki


---------------------------------------------------------------------
Apache XML Graphics Project URL: http://xmlgraphics.apache.org/
To unsubscribe, e-mail: general-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: general-help@xmlgraphics.apache.org


Re: Fw: [DISCUSS] PDFBox proposal

Posted by Chris Bowditch <bo...@hotmail.com>.
Jeremias Maerki wrote:

Hi Jeremias,

sorry for the slow reply.

> Yesterday, we've discussed a possible incubation of PDFBox at the ASF.
> There are several projects that are interested in such a move. For us
> here in the XML Graphics project, PDFBox is interesting due to its
> parsing functionality. Our own PDF library doesn't have that
> functionality and is instead optimized for writing PDF which PDFBox
> isn't.

I agree PDFBox would be a useful complimentary library to our own.

> 
> As you may know, I've implemented a FOP plug-in that allows embedding of
> PDF in newly generated PDF documents through XSL-FO. Using the same PDF
> library for both tasks would be beneficial in the long-term.

Yes agreed.

> 
> Please take a look at the incubation proposal (link below) we're
> currently writing. I have some questions to the XML Graphics community
> in this context:
> 
> - Should the XML Graphics PMC be the sponsoring entity? [1]

Yes, but I don't have any time to support such a process.

> - Can anyone besides me imagine investing time/resources to help with
> the incubation, teaching PDFBox additional tricks like we need them?

Sorry I just don't have enough time to help.

> - Can we imagine PDFBox becoming a subproject of XML Graphics after
> successful incubation? PDF is not really an XML technology but deals
> with graphical output. Newer technologies like XPS (Microsoft's XML
> paper specification) and Adobe's Mars are XML-based paged document
> formats. Not that they play a big role in the market, yet.

I am starting to hear clients talking about XPS.

> 
> [1] Makes sense if we have a strong interest in PDFBox. If it's just me,
> then it doesn't make sense and we're going to find a different solution.
> 
> Please note: We have some functionality overlap between our PDF library
> and PDFBox in any case. Examples:
> - Writing PDF (org.apache.fop.pdf)

Since our library is optimized for this, we probably should just leave 
the writing to FOP's PDF Library.

> - Parsing fonts (org.apache.fop.fonts, org.apache.batik.svggen.font.table)
> - Font conversion (org.apache.batik.svggen.font)
> - XMP metadata (org.apache.xmlgraphics.xmp)
> - Image loading (org.apache.fop.image, org.apache.batik.ext.awt.image.spi)
> 
> BTW, the above table shows some spots where we could actually discuss
> better cooperation within XML Graphics, i.e. between Batik & FOP.

Merging the other packages will be more of a challenge to ensure the end 
result has the best that all projects have to offer.

> 
> Thoughts?

I vote +1 in favour but as already mentioned I can't help in this process.

Chris

<snip/>



---------------------------------------------------------------------
Apache XML Graphics Project URL: http://xmlgraphics.apache.org/
To unsubscribe, e-mail: general-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: general-help@xmlgraphics.apache.org


Re: [DISCUSS] PDFBox proposal

Posted by Vincent Hennebert <vi...@anyware-tech.com>.
Hi Jeremias,

Jeremias Maerki a écrit :
> Comments inline...
>
> On 15.11.2007 11:56:38 Vincent Hennebert wrote:
> <snip/>
>>> Thoughts?
>> A few. I lack a bit of skills in that whole area, but in the hope they
>> will be useful:
>> - my understanding is that our PDF library is quite specialized for
>>   producing output from the area tree.
>
> Not at all! The PDF library doesn't know a thing about the area tree.

Ok, I stand corrected. The two different pdf packages (o.a.f.pdf and
o.a.f.render.pdf) suddenly take all their sense to me ;-) (that said
I’ve never really tried to understand the why and how).

<snip/>
> The embedding of fonts is usually output format specific and IMO doesn't
> belong into a general font library. But otherwise, it's true a common
> library for handling fonts is useful, which is why Victor created the
> foray-font module and why Ben created the FontBox SF project. And I
> simply need to move the font package out to Commons when I move the PDF
> library so we can transfer the PDFTranscoder over to Batik.

Yeah, in an ideal world we would probably have a font package as
a sub-project of XML Graphics and PDFBox as a top-level project
depending on it.


>> - that said, we would probably benefit from a general-purpose PDF
>>   library that would provide us with extra-functionalities like
>>   encryption (and tagged PDF?). It might make sense to keep our output
>>   library in a minimal form, and use PDFBox as a post-processor for
>>   optimizing the output or adding encryption or whatever.
>>   You told about a PDF/A validator, but even a general PDF validator
>>   would perhaps be useful.
>
> Hey, we have encryption, although not the strong stuff, yet. I don't
> like the post-processing idea at all if it can be avoided.
> Post-processing always means performance loss.

Sure, but it can be kept optional for those features that can’t be
easily implemented in a one-pass approach. For example, IIC, FOP doesn’t
produce linearized PDFs for incremental access from a network. Typically
something that requires two passes (just like compilers use several
intermediate steps before producing binaries). And here PDFBox might be
of interest.


>> I’m slightly doubtful it would make sense to have PDFBox as an XML
>> Graphics subproject, because it has both too many and not enough
>> features for our needs. Although it’s obvious that stuff can be shared
>> between the projects, and that one would have its place as
>> a sub-project. But PDFBox probably deserves to be a top-level one, all
>> the more if other Apache projects would also have a use of it. For us
>> that would be a dependency, like the other jars in the lib/ directory.
>> That said, had I to vote, that would probably be a +0.9.
>
> I agree that PDFBox should become a TLP if possible. The problem could
> be the same as with FOP/Batik: Established technology is not attracting
> too many new developers. Most people look at this like a given. I fear
> that PDFBox might not attract enough of a developer community. I see it
> myself: We have a great PDF library which just basically lacks one
> feature and that's why I/we should invest a lot of time merging two PDF
> libraries into one? I don't know how this will work out.

I agree, although I’m not sure to what conclusion that leads... XML
Graphics sub-project or not? In fact the success of the incubation
itself seems to be questionable.
For once the problem is more technical than political!

Vincent

---------------------------------------------------------------------
Apache XML Graphics Project URL: http://xmlgraphics.apache.org/
To unsubscribe, e-mail: general-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: general-help@xmlgraphics.apache.org


Re: [DISCUSS] PDFBox proposal

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Comments inline...

On 15.11.2007 11:56:38 Vincent Hennebert wrote:
<snip/>
> > Thoughts?
> 
> A few. I lack a bit of skills in that whole area, but in the hope they 
> will be useful:
> - my understanding is that our PDF library is quite specialized for 
>   producing output from the area tree.

Not at all! The PDF library doesn't know a thing about the area tree.
The PDFRenderer is the adapter that translates the area tree into the
PDF library's object model. What your PDF library is specialized in is
producing new PDFs with a low memory profile and high speed. The PDF
library could easily be used by any other project that wants to create
PDF files. That's actually the reason why I still want to move our own
PDF library to XML Graphics Commons (besides making a clean dependency
tree between Batik and FOP).

>   In the end there is probably some 
>   common stuff that can be factored out of the several renderers 
>   (mainly: AFP, PostScript, PDF). I’m not sure PDFBox would integrate 
>   smoothly in that scheme.

Common stuff for the area tree is already factored out as far as
possible: AbstractRenderer, PrintRenderer and
AbstractPathOrientedRenderer.

> - to a certain extent there may be the same issue with fonts. Our needs 
>   go slightly further than just parsing PostScript/TrueType/OpenType 
>   fonts and embedding them in the output format. We also need to embed 
>   them in PostScript, or convert them into AWT, possibly AFP, etc. 
>   Ideally another sub-project dedicated to fonts, on which 
>   FOP/Batik/PDFBox would rely, would probably be necessary.

The embedding of fonts is usually output format specific and IMO doesn't
belong into a general font library. But otherwise, it's true a common
library for handling fonts is useful, which is why Victor created the
foray-font module and why Ben created the FontBox SF project. And I
simply need to move the font package out to Commons when I move the PDF
library so we can transfer the PDFTranscoder over to Batik.

> - that said, we would probably benefit from a general-purpose PDF 
>   library that would provide us with extra-functionalities like 
>   encryption (and tagged PDF?). It might make sense to keep our output 
>   library in a minimal form, and use PDFBox as a post-processor for 
>   optimizing the output or adding encryption or whatever.
>   You told about a PDF/A validator, but even a general PDF validator 
>   would perhaps be useful.

Hey, we have encryption, although not the strong stuff, yet. I don't
like the post-processing idea at all if it can be avoided.
Post-processing always means performance loss.

> I’m slightly doubtful it would make sense to have PDFBox as an XML 
> Graphics subproject, because it has both too many and not enough 
> features for our needs. Although it’s obvious that stuff can be shared 
> between the projects, and that one would have its place as 
> a sub-project. But PDFBox probably deserves to be a top-level one, all 
> the more if other Apache projects would also have a use of it. For us 
> that would be a dependency, like the other jars in the lib/ directory. 
> That said, had I to vote, that would probably be a +0.9.

I agree that PDFBox should become a TLP if possible. The problem could
be the same as with FOP/Batik: Established technology is not attracting
too many new developers. Most people look at this like a given. I fear
that PDFBox might not attract enough of a developer community. I see it
myself: We have a great PDF library which just basically lacks one
feature and that's why I/we should invest a lot of time merging two PDF
libraries into one? I don't know how this will work out.

> Hope that all makes sense,
> Vincent


Jeremias Maerki

---------------------------------------------------------------------
Apache XML Graphics Project URL: http://xmlgraphics.apache.org/
To unsubscribe, e-mail: general-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: general-help@xmlgraphics.apache.org


Re: Fw: [DISCUSS] PDFBox proposal

Posted by Vincent Hennebert <vi...@anyware-tech.com>.
Hi,

Jeremias Maerki wrote:
> Yesterday, we've discussed a possible incubation of PDFBox at the ASF.
> There are several projects that are interested in such a move. For us
> here in the XML Graphics project, PDFBox is interesting due to its
> parsing functionality. Our own PDF library doesn't have that
> functionality and is instead optimized for writing PDF which PDFBox
> isn't.
> 
> As you may know, I've implemented a FOP plug-in that allows embedding of
> PDF in newly generated PDF documents through XSL-FO. Using the same PDF
> library for both tasks would be beneficial in the long-term.
> 
> Please take a look at the incubation proposal (link below) we're
> currently writing. I have some questions to the XML Graphics community
> in this context:
> 
> - Should the XML Graphics PMC be the sponsoring entity? [1]

A small reservation, only because to me PDFBox is “more than that”. See 
below.


> - Can anyone besides me imagine investing time/resources to help with
> the incubation, teaching PDFBox additional tricks like we need them?

I’m afraid not. Not due to a lack of interest, but only time really. And 
I’d already like to help with the incubation of Jeuclid (even if 
I haven’t done anything in this area so far :-\).


> - Can we imagine PDFBox becoming a subproject of XML Graphics after
> successful incubation? PDF is not really an XML technology but deals
> with graphical output.

This aspect is not a problem for me. We already have PostScript-related 
stuff in Commons, which doesn’t have anything to do with XML either. On 
the long term we should probably emphasise the “Graphics” part of the 
project’s name.


> Newer technologies like XPS (Microsoft's XML
> paper specification) and Adobe's Mars are XML-based paged document
> formats. Not that they play a big role in the market, yet.
> 
> [1] Makes sense if we have a strong interest in PDFBox. If it's just me,
> then it doesn't make sense and we're going to find a different solution.
> 
> Please note: We have some functionality overlap between our PDF library
> and PDFBox in any case. Examples:
> - Writing PDF (org.apache.fop.pdf)
> - Parsing fonts (org.apache.fop.fonts, org.apache.batik.svggen.font.table)
> - Font conversion (org.apache.batik.svggen.font)
> - XMP metadata (org.apache.xmlgraphics.xmp)
> - Image loading (org.apache.fop.image, org.apache.batik.ext.awt.image.spi)
> 
> BTW, the above table shows some spots where we could actually discuss
> better cooperation within XML Graphics, i.e. between Batik & FOP.
> 
> Thoughts?

A few. I lack a bit of skills in that whole area, but in the hope they 
will be useful:
- my understanding is that our PDF library is quite specialized for 
  producing output from the area tree. In the end there is probably some 
  common stuff that can be factored out of the several renderers 
  (mainly: AFP, PostScript, PDF). I’m not sure PDFBox would integrate 
  smoothly in that scheme.
- to a certain extent there may be the same issue with fonts. Our needs 
  go slightly further than just parsing PostScript/TrueType/OpenType 
  fonts and embedding them in the output format. We also need to embed 
  them in PostScript, or convert them into AWT, possibly AFP, etc. 
  Ideally another sub-project dedicated to fonts, on which 
  FOP/Batik/PDFBox would rely, would probably be necessary.
- that said, we would probably benefit from a general-purpose PDF 
  library that would provide us with extra-functionalities like 
  encryption (and tagged PDF?). It might make sense to keep our output 
  library in a minimal form, and use PDFBox as a post-processor for 
  optimizing the output or adding encryption or whatever.
  You told about a PDF/A validator, but even a general PDF validator 
  would perhaps be useful.

I’m slightly doubtful it would make sense to have PDFBox as an XML 
Graphics subproject, because it has both too many and not enough 
features for our needs. Although it’s obvious that stuff can be shared 
between the projects, and that one would have its place as 
a sub-project. But PDFBox probably deserves to be a top-level one, all 
the more if other Apache projects would also have a use of it. For us 
that would be a dependency, like the other jars in the lib/ directory. 
That said, had I to vote, that would probably be a +0.9.


Hope that all makes sense,
Vincent

---------------------------------------------------------------------
Apache XML Graphics Project URL: http://xmlgraphics.apache.org/
To unsubscribe, e-mail: general-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: general-help@xmlgraphics.apache.org