You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Mark <ma...@plasticsoftware.com.au> on 2001/05/03 13:44:04 UTC

Large documents, FOP and PDF

G'day,

We have been using Xalan, xerces, FOP and PDF for the past few months
with generally positive results. We're excited by the whole XSL thing
but we're running into a few problems, so I figure it's time to get my
hands dirty and contribute something. So I thought I would introduce
myself, see if anyone else out there is doing anything similar, and
(hopefully) make some contributions to the FOP code base.

Essentially our problem is that we need to deal with reasonably large
print runs (several thousand pages) generated by our accounting system
(specifically, it's ISP invoices and statements). My problems are
basically that (a) XSLT is going to explode when I give it a huge XML
file and ask it to transform it into FOP and (b) even if some miracle
occurs and my CPU doesn't melt, I think that FOP is going to explode
once it gets it.

So at this stage I am wondering if there is anyone out there who is
doing anything similar, and what approaches might have been used to
reduce memory use and processing time? My current thought experiment
involves treating each invoice as a separate processing unit (ie,
invoice.xml -> invoice.fo -> invoice.pdf) but the problem with this is
that I need to tread all the invoices as a single bulk print run; at the
moment, the PDF renderer does not seem to be able to do that. It would
be great if I could do ((invoice.xml -> invoice.fo)*) -> invoice.pdf but
this seems impossible at the present.

I have a few ideas and from looking through the code (PDFRenderer.java)
I suspect there are some optimisations and possible paths that are
possible, but the purpose of this email was really to introduce myself,
see if anyone's interested in my problem, and, I guess, to suss out how
my attempts at hacking the FOP code are likely to go down in this
community. I realise I'm a newcomer to the code, maybe a few people
could give me pointers about where I should start or who I should be
talking to .. ?

Regards
Mark

Re: Large documents, FOP and PDF

Posted by Mark <ma...@plasticsoftware.com.au>.
Hi,

> possibly, possibly not. If the transformation is simple, XSLT may be
> overkill. A much simpler Sax-based process would be much much faster,
> usually.

Unfortunately the transformations will not always be simple (you used
the word 'illogical'), though there might be better ways to layout the
data than with XML:FO. But generally the things we like about XSLT are
it's ability to do complex stuff, like arithmetic (e.g. totals).

> the merging of all the PDFs seems the right way to me, because
> its driven by external illogical demands of your printing process,
> which may change, if the printer (physical or human) would accept a
> stream of documents one day

Indeed.

> sure, in theory. but FOP as it standards is predicated on the
> assumption of processing everything in memory. PDF is itself not a
> good format to generate sequentially, IMHO.

In my little excursions through the source I have certainly noticed
this. But I also noticed that the entire PDF document is also stored in
memory, and I couldn't see why the output stream couldn't be written to
directly, rather than putting all the objects into a Vector and then
Enumerating them. (But I haven't looked very hard at that, probably
there's an excellent reason).

> Also, please please remember that FOP remains a beta-stage unfinished
> evaluation tool. I always worry that if people start embedding in it
> production systems, it may hamper FOP's future freedom to change in
> radical ways.

I've certainly considered this, and we do not, for example, use any FO
or XSLT extensions because we want to stick with the W3C wherever
possible. However, the principals in FOP are not likely to change: take
an FO and turn it into PDF (or whatever). It seems to me that if I am
successful in including multi-document support in the PDFRender and it's
taken out at some point in the future, well that's just the risk one
takes when one uses free software. Anyway, we can always stick with the
old version until we get a new solution.

Regards
Mark

Re: Large documents, FOP and PDF

Posted by Sebastian Rahtz <se...@computing-services.oxford.ac.uk>.
Mark writes:
 > what they want with them - one of the options being printing. Thus the
 > problem with generating FO directly is that it would need to be derived
 > from the XML, and the best tool to do this would appear to be XSLT.

possibly, possibly not. If the transformation is simple, XSLT may be
overkill. A much simpler Sax-based process would be much much faster,
usually. 

 > your first suggestion, merging millions of PDFs, would actually be
 > better because we can continue to use XSLT and our current model
 > continues to work.
the merging of all the PDFs seems the right way to me, because
its driven by external illogical demands of your printing process,
which may change, if the printer (physical or human) would accept a
stream of documents one day

 > However I thought it might be possible to modify the PDFRenderer so that
 > multiple FO trees could be rendered into a single properly formed PDF

sure, in theory. but FOP as it standards is predicated on the
assumption of processing everything in memory. PDF is itself not a
good format to generate sequentially, IMHO.

Also, please please remember that FOP remains a beta-stage unfinished
evaluation tool. I always worry that if people start embedding in it
production systems, it may hamper FOP's future freedom to change in
radical ways.

Sebastian


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Large documents, FOP and PDF

Posted by Mark <ma...@plasticsoftware.com.au>.
Hi,

Sebastian wrote,

> I'd generate 10000 separate PDF documents, and then merge them
> together at the end using PDF manipulation tools (such as Etymon's
> "pj" library).

OK, I hadn't considered that there might be such tools...

> I work on a not-dissimilar project involving phone bills, and we are
> not using XSL at all, because of this sort of problem. We generate
> TeX, which is a much more mature system for running single documents
> of tens of thousands of pages.
> Of course, PassiveTeX would do fine processing the .fo file if you can
> generate it. But I would not use XSLT to do that, your CPU *will*
> melt. Just generate FO straight from the database.

TeX is an interesting idea that we hadn't considered, but for various
reasons I am keen to stick to just one set of tools (ie XML). Being able
to customise the output format of our data is very important to us, so
ideally, our invoices will always be XML and our customers can then do
what they want with them - one of the options being printing. Thus the
problem with generating FO directly is that it would need to be derived
from the XML, and the best tool to do this would appear to be XSLT. So
your first suggestion, merging millions of PDFs, would actually be
better because we can continue to use XSLT and our current model
continues to work.

However I thought it might be possible to modify the PDFRenderer so that
multiple FO trees could be rendered into a single properly formed PDF
file, which would mean we can solve the problem without resorting to too
many external tools. So that's where I thought I would start looking,
unless someone (you?) points out to me that this is a stupid idea..?

Cheers
Mark



Re: Large documents, FOP and PDF

Posted by Sebastian Rahtz <se...@computing-services.oxford.ac.uk>.
Mark writes:
 > reduce memory use and processing time? My current thought experiment
 > involves treating each invoice as a separate processing unit (ie,
 > invoice.xml -> invoice.fo -> invoice.pdf) but the problem with this is
 > that I need to tread all the invoices as a single bulk print run; at the
 > moment, the PDF renderer does not seem to be able to do that. It would
 > be great if I could do ((invoice.xml -> invoice.fo)*) -> invoice.pdf but
 > this seems impossible at the present.

I'd generate 10000 separate PDF documents, and then merge them
together at the end using PDF manipulation tools (such as Etymon's
"pj" library).

I work on a not-dissimilar project involving phone bills, and we are
not using XSL at all, because of this sort of problem. We generate
TeX, which is a much more mature system for running single documents
of tens of thousands of pages.

Of course, PassiveTeX would do fine processing the .fo file if you can
generate it. But I would not use XSLT to do that, your CPU *will*
melt. Just generate FO straight from the database.

Sebastian


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Large documents, FOP and PDF

Posted by Weiqi Gao <we...@networkusa.net>.
On 03 May 2001 21:44:04 +1000, Mark wrote:
> 
> possible, but the purpose of this email was really to introduce myself,
> see if anyone's interested in my problem, and, I guess, to suss out how

I'm new to the list too.  It's nice meeting you, Mr. ???? --- Didn't you
just introduce yourself?  I'm sorry, I couldn't remember your last name.
:)

-- 
Weiqi Gao
weiqigao@networkusa.net


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org