You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Victor Mote <vi...@outfitr.com> on 2003/04/30 10:08:59 UTC

eager or patient processing

FOP Developers:

There is a design issue nagging at the back of my mind that I would like to
1) resolve, and 2) document for posterity. Our doc right now is probably
self-contradictory or at least confusing on the issue, not in an explicit
way, but more by implication.

It seems to me that there are two basic processing models in play. The first
I will call "patient" processing, which essentially reads an entire logical
chunk of the input document before processing it. The other I will call
"eager" processing, which looks at a document as it goes past a window,
grabs information out of it, processes it, and tries to get it pushed into
an output pipeline as quickly as possible.

Now, we have two major places where we have opportunities to choose one of
these models: parsing and layout. Right now, AFAIK, FOP uses patient
processing for parsing (using page-sequence objects as the logical chunk),
and eager processing for layout, both in the maintenance and redesign dev
lines.

One of our stated design goals is to be able to process input files of
arbitrary size. This implies one of two things: 1) we also support
page-sequence objects of arbitrary size, or 2) we don't, i.e. we expect
users to ensure that a document is broken up into reasonable-sized
page-sequence objects. Now before I put the question, I need to point out
that choosing option 1 with patient parsing implies that the FO Tree may
need to be serialized. Nevertheless, that seems to me to be the correct
approach. Should we add "support for page-sequence objects of arbitrary
size" to our documented design goals? My vote is:
+1

Now, we have already documented that we may need to contingently handle
serialization of pages that are rendered but that cannot be flushed because
of forward references, etc. My proposal for trying to extrapolate Knuth's
line-breaking algorithm to page layout implies the need for at least the
option of patient processing in the layout process, which in turn implies
the need to be able to serialize the Area Tree. (The good news is that I
don't think you need to serialize both the Area Tree and rendered pages,
since with patient layout, you wouldn't be able to start rendering until the
layout was complete for the page-sequence).

At this point, I need to bring an implementation issue into the discussion.
If any serialization is required at all, which is true even currently, the
implementation issues are when? how? and by whom? While researching this, I
stumbled onto the concept of Memory mapped files, using the FileChannel from
the NIO package in Java 1.4. This is no doubt old news for the java gurus on
this list. If I understand it correctly, it would allow us to virtually
serialize everything, and let Java and the o/s worry about when to actually
do disk i/o. This seems to answer the when? and by whom? questions nicely,
but probably not the how? part. So the questions are:
* What is our plan for serialization of our transient data?
* Does the concept of Memory mapped files help that?
* If serialization can be efficiently achieved, do we still need eager
layout processing at all?
* If so, should we let the user choose either eager or patient processing?
Eager processing might result in occasional non-conformant behavior or less
perfect output, but might be very suitable for many applications. Patient
processing might require more memory, more disk i/o and more processing, but
might be more suitable for other situations.
* If both are needed, can we accommodate that in our redesign?
* Can we remove the design requirement in our doc that pages be rendered
ASAP? My vote:
+1

Thanks for your help. I think it will be a useful hurdle if we can resolve
this and document it. BTW, I am not looking to implement any of this right
away. I am trying to make sure our design is robust enough to handle what we
want here, after we figure out what that is.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org