You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Keiron Liddle <ke...@aftexsw.com> on 2003/05/02 07:30:39 UTC

Re: eager or patient processing

> FOP Developers:
> 
> There is a design issue nagging at the back of my mind that I would like to
> 1) resolve, and 2) document for posterity. Our doc right now is probably
> self-contradictory or at least confusing on the issue, not in an explicit
> way, but more by implication.
> 
> It seems to me that there are two basic processing models in play. The first
> I will call "patient" processing, which essentially reads an entire logical
> chunk of the input document before processing it. The other I will call
> "eager" processing, which looks at a document as it goes past a window,
> grabs information out of it, processes it, and tries to get it pushed into
> an output pipeline as quickly as possible.
> 
> Now, we have two major places where we have opportunities to choose one of
> these models: parsing and layout. Right now, AFAIK, FOP uses patient
> processing for parsing (using page-sequence objects as the logical chunk),
> and eager processing for layout, both in the maintenance and redesign dev
> lines.
> 
> One of our stated design goals is to be able to process input files of
> arbitrary size. This implies one of two things: 1) we also support
> page-sequence objects of arbitrary size, or 2) we don't, i.e. we expect
> users to ensure that a document is broken up into reasonable-sized
> page-sequence objects. Now before I put the question, I need to point out
> that choosing option 1 with patient parsing implies that the FO Tree may
> need to be serialized. Nevertheless, that seems to me to be the correct
> approach. Should we add "support for page-sequence objects of arbitrary
> size" to our documented design goals? My vote is:
> +1

+1, I think that was the original intention

> Now, we have already documented that we may need to contingently handle
> serialization of pages that are rendered but that cannot be flushed because
> of forward references, etc. My proposal for trying to extrapolate Knuth's
> line-breaking algorithm to page layout implies the need for at least the
> option of patient processing in the layout process, which in turn implies
> the need to be able to serialize the Area Tree. (The good news is that I
> don't think you need to serialize both the Area Tree and rendered pages,
> since with patient layout, you wouldn't be able to start rendering until the
> layout was complete for the page-sequence).

My question would be, is it adaquate for us to only do minimal layout on a 
complete page when the reference is resolved. The page with forward 
references is complete except for this minimal parsing. Then I can say that the 
layout process has no relevance to the area tree. The layout process is done 
totally with layout breaks etc. once the layout is decided then the area tree is 
created.
So this fits in with what is already there in terms of caching of pages in the area 
tree.
Caching of the rendered pages should not be needed, if it cannot render oput of 
order then the area tree caching already handles this and if it may need to go 
back to a previous page (AWT) then the area tree acts a a storage of pages 
which can be cached.

> At this point, I need to bring an implementation issue into the discussion.
> If any serialization is required at all, which is true even currently, the
> implementation issues are when? how? and by whom? While researching this, I
> stumbled onto the concept of Memory mapped files, using the FileChannel from
> the NIO package in Java 1.4. This is no doubt old news for the java gurus on
> this list. If I understand it correctly, it would allow us to virtually
> serialize everything, and let Java and the o/s worry about when to actually
> do disk i/o. This seems to answer the when? and by whom? questions nicely,
> but probably not the how? part. So the questions are:
> * What is our plan for serialization of our transient data?

counter question - how good do you think the current area tree pages caching is.


> * Does the concept of Memory mapped files help that?

The main issue when caching is keeping the parts you want cached as 
independant an minimal as possible. I don't know about memory mapped files so I 
can't answer directly.

> * If serialization can be efficiently achieved, do we still need eager
> layout processing at all?

If the processing is done on demand, eg. over a network, you want the result to 
start as soon as possible, so I would say yes. I don't see how patient with 
caching could ever be as fast as eager.

> * If so, should we let the user choose either eager or patient processing?
> Eager processing might result in occasional non-conformant behavior or less
> perfect output, but might be very suitable for many applications. Patient
> processing might require more memory, more disk i/o and more processing, but
> might be more suitable for other situations.

If you are saying that eager may resultin non-conformant then are you suggesting 
that eager must cut corners. What about asking should we make it possible to 
have quick and dirty processing - or - most conformant processing.
So yes.

> * If both are needed, can we accommodate that in our redesign?

I believe we can.

> * Can we remove the design requirement in our doc that pages be rendered
> ASAP? My vote:
> +1

I think the ASAP statement is refering to what happens to a page that is added to 
the area tree and as I said above that is not related to the layout process apart 
from resolving forward references.
Maybe the statement could be more specific, but I think we should keep it there.


> Thanks for your help. I think it will be a useful hurdle if we can resolve
> this and document it. BTW, I am not looking to implement any of this right
> away. I am trying to make sure our design is robust enough to handle what we
> want here, after we figure out what that is.
> 
> Victor Mote
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


AW: eager or patient processing

Posted by "J.U. Anderegg" <ha...@bluewin.ch>.
Some observations from page caching by my renderer might be helpful in this
discussion.

o the words of a cached page take about 80KB.

O cached page data takes about 20K

o removing the word areas right after processing by the renderer saves
140KB. This is memory used by the words in the area tree. Does the area tree
hold data that is never used? Not only by the renderer?

These are very rough figures. However, they indicate to me, that large
documents may be processed in memory by an appropriate design.

o Apply the principles of relational databases to eliminate redundancies:
set up tables of unique/used fonts, strokes, colors, ...  and have the
objects reference table entries. This will cost table lookups, CPU. However,
it will also ease state processing. The program does not have to keep track
of inherited properties set 300 FO elements earlier. The nuisance is that
style sheets have acceptable redundancy (see DocBook), XSLT replicates
properties innumerable times and FOP has to recollect and normalize all this
stuff.

o Define a division of work between layouter and renderer. The layouter
deals with font metrics, image and SVG dimensions. The renderer renders.
What's done at which time is important.

o Ship text lines, line fragments to the renderer instead of words.

There must be a straightforward design with an option for patient and direct
output. Classes, their fields and data structures have to be developped
carefully in parallel with the layout manager. Page caching in a random
access file on disk be needed for very large documents.

Hansuli Anderegg



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: eager or patient processing

Posted by Victor Mote <vi...@outfitr.com>.
Peter B. West wrote:

> My mistake; I was reading "rendering" as "layout".  However, they are
> not so dissimilar.  As soon as the layout of a page is final, the
> rendering can proceed.  My point was that, because the FO tree
> construction is, despite the discussion in the Recommendation, dependent
> upon parallel Area tree construction, these processes, at least, *must*
> proceed in parallel.

OK, now I understand. If you allow for patient processing, rendering and
layout have at least the possibility of being done at very different times.
Any page-sequence that is unresolved because of a forward reference can hold
up all subsequent page-sequences as well, since a page-sequence layout
optimization might add or subtract a page from the sequence based on the
actual page number of the forward reference. (Yes, I know there is an
iteration problem here).

> My interest in a pull model for the Area tree is also two-fold.  In the
> first place, if the areas are constructed ASAP, a context is provided
> for the "following" FO expressions.  In the second, I suspect that the
> logic of composition may be illuminated by such an approach.  I have
> mentioned Ken Holman's comments about an unnamed implemetation which did
> not construct an Area tree at all.  I had wondered about this, but with
> the approach I am considering, it may be possible to do away with the FO
> tree, by folding it into the Area tree.  I am not at all sure this will
> be possible, but it is an intriguing idea.

I don't recall Ken Holman's comments, but FOP doesn't always use an area
tree either. The StructureRenderers (MIF, RTF) don't build one. So, he might
have been talking about JFor or something similar.

Instead of folding the FO tree and Area tree together, consider having the
Area tree simply point into the FO tree, i.e. be a different "view" of its
data. As part of the text-transform work, I added some methods that
logically connect FOText objects that are in the same block together, so
that you can handle them sequentially. (I think you called it "drawing
threads through the nodes" in your web page, but maybe that was a different
context.) So you might have an Area Tree object that points to 1) an FOText
object, 2) an offset, or starting place, and 3) a size. I also went through
the push/pull thought process several months ago, and decided that there
really is need for a third model, which is the one that optimizes the entire
layout.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: eager or patient processing

Posted by "Peter B. West" <pb...@powerup.com.au>.
Victor Mote wrote:
> Peter B. West wrote:
> 
> 
>>My comments are made with respect to the work I have done with FO tree
>>building, and they are in the nature if thinking out loud.  I notice
>>that you are doing a good deal of refactoring in the
>>area of FO tree building, so I am hoping that you will be of assistance
>>in integrating that work into the main line.
> 
> 
> I'll be very glad to help integrate your work. When I was working on the
> properties doc, I spent quite a bit of time looking at your Alt Design doc,
> and I think I like where you are headed with the properties work.

Your offer, as with those of Jeremias and Rhett, is much appreciated.

>>It seems to me, as you pointed out in your closing comments, that
>>eager/patient is a false distinction.  It implies that there are two
>>major paradigms that can be implemented.  As you and Keiron have already
>>discussed though, the major variable is in the degree of control
>>required over the formatting.  If a user is prepared to tolerate
>>rough-and-ready formatting for things like page numbers, and is prepared
>>to forgo, for example, automatic column widths, then the requirements
>>for deep forward processing are minimised.
> 
> 
> I didn't mean to imply that eager/patient is a false distinction, only that
> the term "ASAP" doesn't necessarily preclude either. I am convinced that we
> need both, but that the one that is negotiable is eager, which is the one
> that we use right now. I think the two models are very different paradigms,
> but not incompatible. I think we will benefit from abstracting some of the
> control out of our subsystem classes up into a higher level control that can
> manage both.
> 
> 
>>However, formatting FOs at any given level of precision imposes its own
>>constraints on the required depth of the FO and Area trees.  In a
>>situation with given formatting parameters, there is neither any
>>possibility of reducing the depth below a certain point, nor any
>>requirement for any greater depth.
>>
>>For example, your idea of optimising an entire page sequence is one that
>>implies places certain demands on the depth of the trees, but which, in
>>the absence of any other requirements, will not need to maintain more of
>>the trees than is relevant to the page sequence.
>>
>>The default assumption must be that we will need tree information across
>>the whole of the document.  There may be ways to condense the required
>>information, and there may be better or worse ways to provide that data,
>>but some way will be required.  (Your discussion of memory-mapped files
>>is cetainly interesting from this point fo view.)
> 
> 
> I understand your point, and it is an important one.  Our documentation does
> currently say that the page-sequence can be processed as a chunk:
> http://xml.apache.org/fop/design/fotree.html#recycle
> That will need to be changed after we come to a consensus on the how the
> model will work.
> 
> Optimizing a page sequence feeds into the need for retaining more
> information longer, and defines the limits of what I think needs to be
> optimized, but I don't mean to imply that it defines the limits of what
> needs to be retained.
> 
> 
>>Not only is this approach *not* inimical to ASAP rendering, but it
>>requires ASAP rendering.
> 
> 
> I thought I was with you, but I am totally lost here. Please explain. Maybe
> you mean "ASAP layout" instead of "ASAP rendering".

My mistake; I was reading "rendering" as "layout".  However, they are 
not so dissimilar.  As soon as the layout of a page is final, the 
rendering can proceed.  My point was that, because the FO tree 
construction is, despite the discussion in the Recommendation, dependent 
upon parallel Area tree construction, these processes, at least, *must* 
proceed in parallel.

>>The way the push parsing in alt.design works is by automatically
>>balancing parsing and FO tree building activities.  In a naive design,
>>this activity could be extended through the area tree building and the
>>rendering stages to provide balanced producer-consumer pipelines.
>>
>>This is not a naive design, but it seems to me that elements of such a
>>design are essential here.  Notice that in the push parser, the SAX
>>event producer takes a back seat to the FO tree builder, which "knows"
>>what to expect, and is fed XML events on demand from the event buffers.
>>  I'm toying with the idea of extending this to a pull model of area
>>construction, with the area builder "knowing" what is allowed next, and
>>feeding requests to the FO tree builder, complete with context.  This
>>would also include the cases where the context is uncertain or just
>>plain unknown.  I'll play around with this a bit more and post
>>any findings.
> 
> 
> I spent quite a bit of time looking at your parsing doc, and I interpreted
> it to be a way of handling eager processing more eagerly: "The sub-systems -
> xml parsing, FO tree construction, Area tree construction and rendering -
> must run in parallel if the footprint is to be kept manageable." I don't
> have a problem with that, but I want to make sure that we have room in our
> model for managing the document in a different way.

The push parsing was inplemented for two reasons.  Primarily, I wanted 
the parsing to be hierarchically structured, so that one could see what 
was going on by reading down the code from the top of the FO Tree builder.

Secondly, I wanted to be able to parallelise (!) the activities in a 
manner analogous to building pipelined commands in the unix shell.

E.g.

verbose_cmd | egrep 'interesting|unusual' | sed 's/unusual/bizarre/g'

The same result can be accomplished by executing each of the commands in 
turn.

verbose_cmd >/tmp/file; \
egrep 'interesting|unusual' </tmp/file >/tmp/file2; \
sed 's/unusual/bizarre/g' </tmp/file2

In the first version, the pipes buffer the output from the earlier 
process into the input of the following.  When the pipe between egrep 
and sed fills (because sed hasn't got around to emptying it yet) egrep 
hangs on output.  egrep's input buffer fills up, and verbose_cmd hangs 
output.  As sed processes its input buffers, the effect ripples through, 
and the downstream processes are able to resume.

The second version achieves the same result, but at the cost of 
buffering all of the output from each of the commands.

My interest in a pull model for the Area tree is also two-fold.  In the 
first place, if the areas are constructed ASAP, a context is provided 
for the "following" FO expressions.  In the second, I suspect that the 
logic of composition may be illuminated by such an approach.  I have 
mentioned Ken Holman's comments about an unnamed implemetation which did 
not construct an Area tree at all.  I had wondered about this, but with 
the approach I am considering, it may be possible to do away with the FO 
tree, by folding it into the Area tree.  I am not at all sure this will 
be possible, but it is an intriguing idea.

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: eager or patient processing

Posted by Victor Mote <vi...@outfitr.com>.
Peter B. West wrote:

> My comments are made with respect to the work I have done with FO tree
> building, and they are in the nature if thinking out loud.  I notice
> that you are doing a good deal of refactoring in the
> area of FO tree building, so I am hoping that you will be of assistance
> in integrating that work into the main line.

I'll be very glad to help integrate your work. When I was working on the
properties doc, I spent quite a bit of time looking at your Alt Design doc,
and I think I like where you are headed with the properties work.

> It seems to me, as you pointed out in your closing comments, that
> eager/patient is a false distinction.  It implies that there are two
> major paradigms that can be implemented.  As you and Keiron have already
> discussed though, the major variable is in the degree of control
> required over the formatting.  If a user is prepared to tolerate
> rough-and-ready formatting for things like page numbers, and is prepared
> to forgo, for example, automatic column widths, then the requirements
> for deep forward processing are minimised.

I didn't mean to imply that eager/patient is a false distinction, only that
the term "ASAP" doesn't necessarily preclude either. I am convinced that we
need both, but that the one that is negotiable is eager, which is the one
that we use right now. I think the two models are very different paradigms,
but not incompatible. I think we will benefit from abstracting some of the
control out of our subsystem classes up into a higher level control that can
manage both.

> However, formatting FOs at any given level of precision imposes its own
> constraints on the required depth of the FO and Area trees.  In a
> situation with given formatting parameters, there is neither any
> possibility of reducing the depth below a certain point, nor any
> requirement for any greater depth.
>
> For example, your idea of optimising an entire page sequence is one that
> implies places certain demands on the depth of the trees, but which, in
> the absence of any other requirements, will not need to maintain more of
> the trees than is relevant to the page sequence.
>
> The default assumption must be that we will need tree information across
> the whole of the document.  There may be ways to condense the required
> information, and there may be better or worse ways to provide that data,
> but some way will be required.  (Your discussion of memory-mapped files
> is cetainly interesting from this point fo view.)

I understand your point, and it is an important one.  Our documentation does
currently say that the page-sequence can be processed as a chunk:
http://xml.apache.org/fop/design/fotree.html#recycle
That will need to be changed after we come to a consensus on the how the
model will work.

Optimizing a page sequence feeds into the need for retaining more
information longer, and defines the limits of what I think needs to be
optimized, but I don't mean to imply that it defines the limits of what
needs to be retained.

> Not only is this approach *not* inimical to ASAP rendering, but it
> requires ASAP rendering.

I thought I was with you, but I am totally lost here. Please explain. Maybe
you mean "ASAP layout" instead of "ASAP rendering".

> The way the push parsing in alt.design works is by automatically
> balancing parsing and FO tree building activities.  In a naive design,
> this activity could be extended through the area tree building and the
> rendering stages to provide balanced producer-consumer pipelines.
>
> This is not a naive design, but it seems to me that elements of such a
> design are essential here.  Notice that in the push parser, the SAX
> event producer takes a back seat to the FO tree builder, which "knows"
> what to expect, and is fed XML events on demand from the event buffers.
>   I'm toying with the idea of extending this to a pull model of area
> construction, with the area builder "knowing" what is allowed next, and
> feeding requests to the FO tree builder, complete with context.  This
> would also include the cases where the context is uncertain or just
> plain unknown.  I'll play around with this a bit more and post
> any findings.

I spent quite a bit of time looking at your parsing doc, and I interpreted
it to be a way of handling eager processing more eagerly: "The sub-systems -
xml parsing, FO tree construction, Area tree construction and rendering -
must run in parallel if the footprint is to be kept manageable." I don't
have a problem with that, but I want to make sure that we have room in our
model for managing the document in a different way.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: eager or patient processing

Posted by "Peter B. West" <pb...@powerup.com.au>.
Victor,

Firstly, thanks for the great work you are doing on the documentation.

I seem to have missed the original of this post, but I think Keiron has
included most of it.

My comments are made with respect to the work I have done with FO tree
building, and they are in the nature if thinking out loud.  I notice 
that you are doing a good deal of refactoring in the
area of FO tree building, so I am hoping that you will be of assistance
in integrating that work into the main line.

It seems to me, as you pointed out in your closing comments, that 
eager/patient is a false distinction.  It implies that there are two 
major paradigms that can be implemented.  As you and Keiron have already 
discussed though, the major variable is in the degree of control 
required over the formatting.  If a user is prepared to tolerate 
rough-and-ready formatting for things like page numbers, and is prepared 
to forgo, for example, automatic column widths, then the requirements 
for deep forward processing are minimised.

However, formatting FOs at any given level of precision imposes its own
constraints on the required depth of the FO and Area trees.  In a
situation with given formatting parameters, there is neither any
possibility of reducing the depth below a certain point, nor any
requirement for any greater depth.

For example, your idea of optimising an entire page sequence is one that
implies places certain demands on the depth of the trees, but which, in
the absence of any other requirements, will not need to maintain more of
the trees than is relevant to the page sequence.

The default assumption must be that we will need tree information across
the whole of the document.  There may be ways to condense the required
information, and there may be better or worse ways to provide that data,
but some way will be required.  (Your discussion of memory-mapped files
is cetainly interesting from this point fo view.)

Not only is this approach *not* inimical to ASAP rendering, but it
requires ASAP rendering.

Paul Grosso has just responded to two questions which had implications
for the FO tree building process, although these have not yet appeared
in the Disposition of Comments http://www.w3.org/Style/XSL/2003/01/FO-DoC

The most interesting one is the confirmation that FO expression
resolution depends on "composition".  Here is that response:

<quote>
>Please clarify the editors' expectation for the resolution of
>>an expression like "25% + 3pt" an FO where the relative value
>>is resolved in terms of an enclosing reference area.
>>
>>The difficulty with respect to such expressions that I have
>>experienced in implementing FO tree building is that they force
>>property resolution into dependency on Area tree construction.


This is true.  Some uses of percentages do require feedback from
the result of area tree construction.


>>My current implementation of FO tree building rejects
>>expressions such as "25% + 3pt" on the basis that (part of)
>>"...the expression value cannot be converted to the necessary
>>type for the property value," (5.9.12) within the context of
>>the building of the FO tree.


We may consider making a modification to XSL 1.1 that makes
the handling of percentages in places that require dependency
on composition part of extended level conformance, but leaves
the handling of those that can be computed before formatting as
part of the base conformance level.

</quote>

The fact that the editors would consider such a change is indicative of
their appreciation that the spec makes difficult implementation demands
in this area.  However, I think that expressions like "25% + 3pt" flow
so naturally off the keyboard that they must be considered fundamental.
  The main implication of this is that the whole description of the flow
of processing as it stands in the spec needs to be rewritten to stress
the fact that certain FO expressions will trigger pipelines of
look-ahead "composition" which may themselves involve some nested
backtracking.

The bottom line of Paul's comments, as I see it, is that FO expressions 
cannot, in general, be resolved until the area tree which forms their 
context has been been constructed.

The way the push parsing in alt.design works is by automatically
balancing parsing and FO tree building activities.  In a naive design,
this activity could be extended through the area tree building and the
rendering stages to provide balanced producer-consumer pipelines.

This is not a naive design, but it seems to me that elements of such a
design are essential here.  Notice that in the push parser, the SAX
event producer takes a back seat to the FO tree builder, which "knows"
what to expect, and is fed XML events on demand from the event buffers. 
  I'm toying with the idea of extending this to a pull model of area 
construction, with the area builder "knowing" what is allowed next, and 
feeding requests to the FO tree builder, complete with context.  This 
would also include the cases where the context is uncertain or just 
plain unknown.  I'll play around with this a bit more and post any findings.

Peter

Victor Mote wrote:
> Keiron Liddle wrote:
> 
>>
>>I think the ASAP statement is refering to what happens to a page
>>that is added to
>>the area tree and as I said above that is not related to the
>>layout process apart
>>from resolving forward references.
>>Maybe the statement could be more specific, but I think we should
>>keep it there.
> 
> 
> I understand what you are saying. Actually "ASAP" still leaves room for
> patient processing -- it just isn't Possible As Soon As it as for eager. I
> just want to make sure that this statement doesn't back us into an
> always-eager model.

-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: eager or patient processing

Posted by Victor Mote <vi...@outfitr.com>.
Keiron Liddle wrote:

> > Now, we have already documented that we may need to contingently handle
> > serialization of pages that are rendered but that cannot be
> flushed because
> > of forward references, etc. My proposal for trying to
> extrapolate Knuth's
> > line-breaking algorithm to page layout implies the need for at least the
> > option of patient processing in the layout process, which in
> turn implies
> > the need to be able to serialize the Area Tree. (The good news is that I
> > don't think you need to serialize both the Area Tree and rendered pages,
> > since with patient layout, you wouldn't be able to start
> rendering until the
> > layout was complete for the page-sequence).
>
> My question would be, is it adaquate for us to only do minimal
> layout on a
> complete page when the reference is resolved. The page with forward
> references is complete except for this minimal parsing. Then I
> can say that the
> layout process has no relevance to the area tree. The layout
> process is done
> totally with layout breaks etc. once the layout is decided then
> the area tree is
> created.
> So this fits in with what is already there in terms of caching of
> pages in the area
> tree.
> Caching of the rendered pages should not be needed, if it cannot
> render oput of
> order then the area tree caching already handles this and if it
> may need to go
> back to a previous page (AWT) then the area tree acts a a storage
> of pages
> which can be cached.

First, thanks for your thoughtful response. I think I understand how we are
handling the forward reference issue. The new twist that I am trying to
explore is the possibility of using an algorithm that will optimize (for
quality, not speed) the layout of an entire page sequence. It seems to me
that this implies keeping not only the entire Area Tree (for that page
sequence) available, but also the entire FO tree. A similar issue would be a
5 million row table with auto sizing.

> > but probably not the how? part. So the questions are:
> > * What is our plan for serialization of our transient data?
>
> counter question - how good do you think the current area tree
> pages caching is.

I honestly don't know. Our doc speaks of this in the future tense & I didn't
realize it had already been implemented. The issue, from a performance
standpoint, is that you don't want to ever hit the disk unless you have to.
If you have a bunch of available memory, you hope you can just keep what you
need there instead of ever writing to disk. So, you either have to 1)
serialize everything, or 2) decide when there is not enough memory any more
and actual do the disk i/o. I assume (without looking at the code) that we
have implemented the former. The latter seemed to me like a non-trivial
task, and I wasn't even sure how to go about it.  While looking for those
answers, I saw the Memory mapped file stuff. If I understand it, you
serialize to them as if they were disk files, but java works with the o/s to
keep them in memory to the extent possible. The doc I read (O'Reilly's
"Learning Java", 2nd edition, page 328) says: "When a file is memory mapped,
like magic it becomes accessible through a single ByteBuffer--just as if the
entire file was read into memory at once. The implementation of this is
extremely efficient, generally among the fastest ways to access the data....
The reason for this is that all modern operating systems are based on the
idea of virtual memory.... So memory mapping a file is really just taking
advantage of what the OS is doing internally."

> > * If serialization can be efficiently achieved, do we still need eager
> > layout processing at all?
>
> If the processing is done on demand, eg. over a network, you want
> the result to
> start as soon as possible, so I would say yes. I don't see how
> patient with
> caching could ever be as fast as eager.

I agree, both with the conclusion and the reasoning. The reason I asked the
question is because of its effect on our high-level API / control /
environment stuff. The stuff I added to the Avalonization wiki:
http://nagoya.apache.org/wiki/apachewiki.cgi?FOPAvalonization
(Startup Concepts Proposal) is tied in with this as well. Right now, control
of what gets done when is hard-wired into our processing subsystems
(FOTreeBuilder fires up the layout process for example). I am thinking we
need to pull that control up to higher-level objects that are looking at the
big picture -- eager vs. patient processing, multiple rendering contexts,
multiple output formats, etc.

> > * If so, should we let the user choose either eager or patient
> processing?
> > Eager processing might result in occasional non-conformant
> behavior or less
> > perfect output, but might be very suitable for many
> applications. Patient
> > processing might require more memory, more disk i/o and more
> processing, but
> > might be more suitable for other situations.
>
> If you are saying that eager may resultin non-conformant then are
> you suggesting
> that eager must cut corners. What about asking should we make it
> possible to
> have quick and dirty processing - or - most conformant processing.
> So yes.

Well it seems to me that yes, if you allow arbitrary input size (which I
think is good), eager may have to cut corners. Even disregarding the
optimized layout, the 5 million row table I mentioned earlier is a good
example. You either need to parse the whole thing at least twice, or you
have to keep the whole thing available, or you have to cut some corners.
Even with the forward reference problem you really cut corners. If you
assume a 3-digit page number, and it turns out to be a 7-digit page number,
something could very well suffer. Also, some future version of the standard
is going to allow for things like "See <ref-text> on page <ref-page>", where
<ref-text> will be some arbitrary string that you don't know the size of
until you get there.

> > * If both are needed, can we accommodate that in our redesign?
>
> I believe we can.

Do you also agree that this implies giving control over our subsystems to
some higher-level object? So, using the terms I proposed on the wiki,
Document says to FOTreeBuilder, "get me a page sequence". Then instead of
FOTreeBuilder starting layout, it returns control to Document, which then
decides (among other things) whether to tell layout to throw away or keep
FOTree objects as it lays them out. If there is more than one rendering
context to be processed, it may be more efficient to serialize the FOTree,
even in an eager processing model. And, similarly, for output formats using
the same rendering context (say PDF and Postscript), someone either needs to
control the process of getting both output at the same time, or keeping the
AreaTree and processing them in sequence.

> > * Can we remove the design requirement in our doc that pages be rendered
> > ASAP? My vote:
> > +1
>
> I think the ASAP statement is refering to what happens to a page
> that is added to
> the area tree and as I said above that is not related to the
> layout process apart
> from resolving forward references.
> Maybe the statement could be more specific, but I think we should
> keep it there.

I understand what you are saying. Actually "ASAP" still leaves room for
patient processing -- it just isn't Possible As Soon As it as for eager. I
just want to make sure that this statement doesn't back us into an
always-eager model.

Thanks again for your response.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org