You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jeremias Maerki <de...@jeremias-maerki.ch> on 2011/02/01 08:27:36 UTC

Re: PDFBox capabilities

Hi Gary

What do you mean by "just text"? Is that HTML markup, some kind of Wiki
syntax (since you talk about links) or plain text? Anyway, if you
generate HTML you also have to process the text in some way, right? Going
towards XSL-FO is then pretty similar. XSLT could also be used to
convert the HTML to XSL-FO (with the HTML having to be converted to
XHTML by an HTML pretty printer like http://jtidy.sourceforge.net/).

I'm sure Apache PDFBox could do it but I'm convinced that it would take
more effort and will be harder to maintain. Of course, the XSLT/XSL-FO
approach will also take some initial effort, especially the learning
process.

To summarize, it looks to me like the processing pipeline would look
like this:

1. retrieve the "text" from the DB
2. if necessary, convert it to some XML-based markup (XHTML or whatever).
3. use XSLT to convert the markup to XSL-FO.
4. use Apache FOP to convert the markup to PDF.
5. stream the PDF to the user

For the browser display, I guess the process could be quite similar (not
sure what you have planned:

1. retrieve the "text" from the DB
2. if necessary, convert it to some XML-based markup (XHTML or whatever).
3. insert it into a (X)HTML-based page template (ex. with XSLT)
4. stream the HTML to the user

Note: IMO it's important to keep XSLT and XSL-FO apart even though
together they make up XSL. Each can be used independently of each other,
so the collective term "XSL" is usually not very useful.

On 31.01.2011 18:15:31 Gary Wong wrote:
> Hey Jeremias,
> 
> Thanks for the advice.
> 
> Currently, the DB is just text and links to graphics that have been
> uploaded. No
> XSL. I am new to this, is it normally (or supposed to be) XSL?
> 
> FOP looks pretty good but I will have to think about it a bit more; since I
> don't know much about XSL and how to do this on the fly.
> 
> I am helping out for a bunch of 50 yr old seniors. I was hoping to not have
> the
> "user" do to much. Just:
> 
>    - click on an image
>    - opens a html to browse (which hits the DB)
>    - possibly select multiple elements to d/l and then either:
>       - generate PDF into tmpdir and ready for download
>       - have user select download
>    - PDF would be to email to other people
> 
> I am not sure what is normally done. I guess what I am confused at what is
> the
> best way to generate the PDF? And I was told PDFBox could do it for me.
> 
> 
> Thanks!
> 
> g
> 
> 
> 
> 
> 
> 
> ________________________________
> From: Jeremias Maerki <de...@jeremias-maerki.ch>
> To: users@pdfbox.apache.org
> Sent: Wed, January 26, 2011 3:00:46 AM
> Subject: Re: PDFBox capabilities
> 
> Hi Gary,
> 
> Apache PDFBox is a rather low-level PDF library that doesn't offer too
> much in terms of creating complex layouted documents. If you need
> two-channel output with more or less the same layout for HTML and PDF, I
> suggest you take a look in the direction of XSL-FO, i.e. Apache FOP [1],
> for the PDF part.
> 
> The ASF has tools that support both directions with not too much coding.
> Apache Cocoon [2] is a web framework which can generate various output
> formats including HTML and PDF (via XSLT and Apache FOP). And it can
> take data from a database. Apache Forrest [3] is based on Cocoon and
> does about the same you want to do: generate HTML and PDF from the same
> XML-based content format. Well, maybe the latter will not exactly match
> your requirements if you have special layout desires.
> 
> But even if you don't pick Cocoon or something based on Cocoon, Apache
> FOP can help you on the PDF requirement. I assume your content in the DB
> is XML-based. In that case, create an XSLT stylesheet for HTML output
> and one for XSL-FO output. The XSL-FO can then be converted to PDF by
> Apache FOP. Not too much programming involved but you have to know your
> XSLT, HTML and XSL-FO. At any rate, it will be much less work than
> trying to build a layout tool on top of PDFBox and maintaining the Java
> code across layout changes. That's where XSLT is very good at.
> 
> That said, Apache PDFBox is very good when it comes to post-processing
> PDFs. But for creating complex PDFs I think there are more suitable tools.
> I hope noone bites my head off for saying this. ;-)
> 
> HTH
> 
> [1] http://xmlgraphics.apache.org/fop/
> [2] http://cocoon.apache.org/
> [3] http://forrest.apache.org/
> 
> On 25.01.2011 23:22:37 Gary Wong wrote:
> > Hi,
> >
> > Not sure if this is the right place to post. But I couldn't find a
> features
> > list on the web page. I wanted to see if PDFBox could do this:
> >
> > I want so store images and text in a database (and possibly PDFs). Then
> have
> > a Java servlet to access the DB and build a both a webpage and a PDF file?
> > The PDF file will be for downloading and web page for viewing. Both should
> > have the same layout.
> >
> > Would this be hard to do? Any sample code?
> >
> > Thanks!
> >
> > g
> 
> 
> 
> 
> Jeremias Maerki




Jeremias Maerki