You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Vincent Hennebert <vi...@anyware-tech.com> on 2007/08/02 12:13:36 UTC

Improving Keeps and Breaks

Hi all,

Caution, long post ;-)

I’ve been thinking about the handling of keeps and breaks in tables for 
a while, and it seems to me that improvements could be done in that 
whole area. I’ll use break-before as an example but what I’ll be saying 
applies to break-after and keeps as well.

Currently, for every object to which the break-before property applies, 
the value of the property is checked at the beginning of the 
getNextKnuthElements method and, if it corresponds to a hard-break, 
a Knuth penalty with -inf value is produced. This has several 
shortcomings:
- this leads to much code duplication;
- if, say, an fo:block has another fo:block as a child and both have 
  break-before, two penalties will be generated instead of one (although 
  I’m suddenly not so sure of that, more later)
- this makes it difficult to improve breaks to distinguish columns from 
  pages, and keeps to take integer values into account.

In fact, we don’t so much care about the penalty element itself as 
whether there /should/ be a break between elements or not. I mean, the 
LMs which actually care are those which concatenate the elements: 
fo:flow, fo:block, fo:block-container, fo:table-cell, etc. And they all 
do it the same way:
    iterate over the children LMs
    for each LM:
        if there is a following LM, then:
            if the current LM has break-after or the following LM has 
            break-before, then
                generate a Knuth forced break

So the main question is to know whether there is a break before a LM. 
And here that’s quite simple, there are only a few shared behaviours: 
indeed there is a break-before on an element if:
- the break-before property is set on the element itself or the first of 
  its children:
  fo:block, fo:block-container, fo:list-block
- it is set on the element itself or any of its children:
  fo:table-row, fo:list-item
- it is set on the first of its children:
  fo:table-caption, fo:table-cell, fo:list-item-body, 
  fo:list-item-label, fo:wrapper
- special:
  - fo:table: on itself or first table-body
  - fo:table-body: on the first fo:table-row child if any; otherwise on 
    any of fo:table-cell children making the first row
- not applicable:
  fo:table-column, fo:table-header/footer, fo:float, fo:footnote-body

So there would just be a couple of methods to write, for each behaviour. 
We could (for the moment) define a dedicated class with static methods, 
which would be called by each LM (some methods are imaginary, but you 
get the idea):

    public class KeepsAndBreaksHandler {

        public static boolean breakBeforeSelfOrFirstChild(LM lm) {
            return lm.getProperty(break-before).isForced() ||
                    lm.getFirstChild().getProperty(break-before).isForced();
        }

        public static boolean breakBeforeSelfOrAnyChild(LM lm) {
            ...
        }

        public static boolean breakBeforeFirstChild(LM lm) {
            ...
        }
        ...
    }

The method mustKeepWithPrevious would become:
- for BlockLM:
    return KeepsAndBreaksHandler.breakBeforeSelfOrFirstChild(this)
- for TableCellLM:
    return KeepsAndBreaksHandler.breakBeforeFirstChild(this)
etc.

The static method wouldn’t even need to be made synchronized, as the 
class is stateless. Later on we could replace this class with 
a Singleton object that would be passed to the LMs at their creation, 
and we would have some kind of “KeepsAndBreaksHandlerFactory”. But for 
now a simple class with static methods would be sufficient.

We could also put in this class the code for producing the penalties 
elements:
    KnuthElement getElementForBreakBetween(LM firstLM, LM secondLM) {
        if (firstLM.hasBreakAfter() || secondLM().hasBreakBefore()) {
            return new KnuthPenalty(...);
        } else if (firstLM.mustKeepWithNext() || secondLM.mustKeepWithPrevious()) {
            return new KnuthForbiddenBreak()
        }
    }
or something like that.

The benefit would be that the whole handling of breaks can be found in 
just one place, instead of being spread among all the LMs. This is 
easier to correct bugs; this is easier to implement new features (column 
vs page break, integer keeps...); this simplifies the 
getNextKnuthElements methods of the LMs. And so on...


WDYT?
Vincent

Re: Improving Keeps and Breaks

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Oct 19, 2007, at 13:32, Vincent Hennebert wrote:

> Andreas L Delmelle wrote:
>> On Oct 18, 2007, at 19:23, Vincent Hennebert wrote:
> <snip/>
>>> I think I see your point. Basically you’re proposing a push  
>>> method (a LM
>>> notifies its parent LM that it has a break-before) while mine is  
>>> a pull
>>> method (a LM asks its children LMs if they have break-before).
>>
>> Yep, although it would not be the LM but rather the FO that pushes  
>> the
>> break-before upwards to its parent if it is also the first child. The
>
> Sure, of course. BTW, that may be a dumb question, but: how does a FO
> know that it is the first child of its parent? AFAICT there’s no such
> information in the FObjs.

Oh, that's pretty easy. Each FObj has a firstChild member, so the  
most straightforward idiom for this would be something like:

if (parent.firstChild == null) { ... } //if checking before  
addChildNode() has been called
if (this == parent.firstChild) { ... } //if you check after  
addChildNode() has been called

> Which means that an object would blindly
> notify its parent that it has a break, and that would be up to the
> parent to figure out whether it should take it into account or not.
> Would be a bit overkill, wouldn’t it?

No, it wouldn't.

Or at least: depends on what you mean exactly by 'overkill'?

>> LMs would largely continue to work as they do now, except that  
>> under a
>> certain set of conditions, they don't need to check the outside  
>> anymore:
>> only take into account the forced break on its own FO. If there is  
>> none,
>> then no need to recursively check for first descendants having forced
>> breaks.
>
> Of course, but I’m concerned about spreading code relevant to the same
> functionality over several places of the codebase.

That will always be a concern. If you separate some related logic  
into YAUC[*], the same reasoning applies, IMO. Even if the logic is  
contained in a separate class, you still have to look at/know of the  
other classes.

[*] (yet-another-utility-class)

>> <snip />
>> Keeping in mind the above mentioned idea of triggering layout  
>> sooner, if
>> we can guarantee that the layoutengine always receives complete rows,
>> then the table-layout job should become a bit simpler in the general
>> use-case, while still not adding much complexity in trickier, more
>> exotic cases, like:
>> //table-cell/block[position() > 1][@break-before='page']
>
> That one triggers a break /inside/ the table-row, not before it.

What I meant precisely... and, in practice, this will occur far less  
often than forced breaks on the first block in a cell, IIC.

> Anyway, at a given LM level the work to do looks simple enough to me.
>
>
>> especially where the cell's column-number corresponds to the highest
>> column-number.
>>
>> Triggering layout sooner is the only way we are ever going to get  
>> FOP to
>> accept arbitrarily large tables, without consuming massive amounts of
>> heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes
>> (table-cells must have at least one block each) that stay in memory
>> until the page-sequence is completely finished. I wonder how many
>> break-possibilities that generates... :/
>
> Like said above, I don’t think anything in my approach prevents that
> from happening.

Indeed not. Did I say anything else?

>>> A matter of taste, probably, but I think I’d prefer the pull  
>>> method: the
>>> LM performs requests to the appropriate children LMs exactly when  
>>> and if
>>> needed.
>>
>> The only thing an LM should initially pull/request from its children,
>> AFAIU, is a list of elements, given a certain LayoutContext.
>> When composing its own element list, an LM should ideally be able to
>> rely on the lists it receives from its children. Then add/delete/ 
>> update
>> elements and (un)wrap, depending on context that is unknown or
>> irrelevant to the child.
>
> That one I don’t quite agree with. Although I thought of it too on
> a first step. I think it’s more complicated to play with a list of
> elements, try and get the first one from children if any, create a new
> one if necessary, change the break context (column, page) if needed,
> etc., rather than simply request the applicable children LMs for their
> break-before values. And again, in the case of tables that means that
> the merging algorithm needs to deal with many possibilities of break
> elements to occur. That’s really not its job I think.
>
>
>>> That may simplify code as well (and improve its readability) as
>>> some form of pull method is necessary anyway (the
>>> mustKeepWithPrevious/WithNext/Together methods).
>>
>> Keeps are a different story indeed. Big difference is that keeps have
>> strengths, and breaks do not.
>
> Yes, but keeps and breaks are handled at the same place, mainly, when
> a LM considers the stacking of children LMs (BlockLM, for example).  
> And
> the treatment is very similar.

The point is that the layoutengine/-algorithm can focus on the keeps,  
since the breaks would have been 'normalized' (since it is very easy  
to do so; for keeps this much less straightforward, since the  
strengths need to be considered as well, so that might indeed lead to  
unnecessary overhead when done too early)

> <snip />
>>
>> I wouldn't really see this as a problem. The related methods will  
>> never
>> be called, unless there is a flaw in our logic[*]. To stress the fact
>> that they serve no purpose there, we could add overrides that always
>> return false.
>
> That really looks like bad design to me. If some methods don’t  
> apply to
> an object then that object shouldn’t inherit the related
> class/interface.

I don't see this as bad design, just a few subclasses for which the  
inherited methods are useless/not applicable. That will happen in  
virtually any context. Reality is /very/ badly designed. Alas, no  
developer to take the blame for that. ;-)

Seriously, as you said, to a great extent this is a matter of taste.

>> <snip />
>> That's a special case, since in principle a graphic does not itself
>> consist of more layout-objects that need to be stacked. To the
>> layoutengine, a graphic is simply a monolithic box. Graphics are  
>> inline
>> by definition nonetheless, so it could be InlineStackingLM with  
>> the same
>> reservations as for FlowLM and StaticContentLM, but for other methods
>> (the actual 'inline-stacking' can be considered to be delegated to  
>> the
>> producer of the graphic, here).
>
> So it’s not stacking, but stacked. Might make sense to introduce that
> concept, actually.

Mmmmno, I think...
What I mean is: from the layoutengine's point-of-view, the elements  
of the graphic don't matter. If you include SVG, FOP calls on Batik,  
passing a reference to the SVG and the viewport dimensions. With a  
little imagination, you could say that Batik 'stacks' the graphic's  
elements properly (compliant to the SVG Rec). That is not a concern  
for a GraphicLM like it is for an InlineLM.

> To sum up, my main concern is to find code for a same functionality at
> several places of the codebase. Some form of treatment is necessary at
> the LM level anyway, so why not just put all the code there? It may  
> also
> be good to keep FO tree building code as much independent as possible
> from the layout code. Simpler, easier to understand, easier to debug,
> easier to replace a component which another one, etc. The  
> collaborative
> approach you’re proposing looks interesting, but for practical reasons
> it may be best to keeps things separated. The codebase is quite large
> and it’s difficult to have a detailed understanding of all its parts.
> That might be good to be able to concentrate on just one part. Easier
> for newcomers, too.
>
>
> WDYT?

Good enough for me. I certainly did not mean to keep you from going  
about it. As long as it's an improvement when compared to the current  
situation, I don't really have /strong/ feelings on how it /should/  
be done.


Cheers

Andreas

Re: Improving Keeps and Breaks

Posted by Vincent Hennebert <vi...@anyware-tech.com>.

Hi Andreas,

Andreas L Delmelle wrote:
> On Oct 18, 2007, at 19:23, Vincent Hennebert wrote:
<snip/>
>> I think I see your point. Basically you’re proposing a push method (a LM
>> notifies its parent LM that it has a break-before) while mine is a pull
>> method (a LM asks its children LMs if they have break-before).
> 
> Yep, although it would not be the LM but rather the FO that pushes the
> break-before upwards to its parent if it is also the first child. The

Sure, of course. BTW, that may be a dumb question, but: how does a FO 
know that it is the first child of its parent? AFAICT there’s no such 
information in the FObjs. Which means that an object would blindly 
notify its parent that it has a break, and that would be up to the 
parent to figure out whether it should take it into account or not. 
Would be a bit overkill, wouldn’t it?

> LMs would largely continue to work as they do now, except that under a
> certain set of conditions, they don't need to check the outside anymore:
> only take into account the forced break on its own FO. If there is none,
> then no need to recursively check for first descendants having forced
> breaks.

Of course, but I’m concerned about spreading code relevant to the same 
functionality over several places of the codebase.

> Currently (sorry if it becomes boring to stress this) the construction
> of the layout-tree starts only when the end-of-page-sequence event
> occurs. I still see room for changing this in the future, and so I need
> to consider the effects on the layout-algorithm as well: the algorithm
> will, for instance, no longer be able to rely on *all* childLMs being
> available the first time it enters the loop... The last childLM in an
> iteration might turn out to be not-the-last-one-after-all. For many
> following FONodes, the LMs do not exist yet at that point. Not in my
> head, at least. ;-)

I think nothing would prevent the layout process from being started 
earlier. On most FOs only the first child needs to be checked for 
a break. And for a table, only the first row needs to be retrieved in 
order to know if a break must occur before it. That’s another topic, but 
we could imagine that FObj instances and corresponding LMs are 
dynamically created when requested by a parent LM. Whereas at the 
(child) FObj level, it’s unclear to me how we will be able to say that 
we know enough to start the layout, and that we don’t need to grab 
further children FObjs.

> Anyway, I remember that when I implemented implicit column-numbers, I
> also gave TableBody an instance member to check whether we are adding
> cells in the first row or not, so this particular case would be easily
> addressed. (Checking... yep, it's still there.)
> 
> Come to think of tables, I'd consider 'propagation' in terms of pushing
> a forced break on a cell to the first cell in the row.
> In the table-layout code, at the point where we have a reference to the
> row or the first cell in a row, we would immediately know whether there
> is a forced break on a first descendant in any of the following sibling
> cells without having to request the corresponding childLMs and trigger a
> tree-traversal of who-knows-how-many levels.
> 
> Keeping in mind the above mentioned idea of triggering layout sooner, if
> we can guarantee that the layoutengine always receives complete rows,
> then the table-layout job should become a bit simpler in the general
> use-case, while still not adding much complexity in trickier, more
> exotic cases, like:
> //table-cell/block[position() > 1][@break-before='page']

That one triggers a break /inside/ the table-row, not before it. Anyway, 
at a given LM level the work to do looks simple enough to me.

> especially where the cell's column-number corresponds to the highest
> column-number.
> 
> Triggering layout sooner is the only way we are ever going to get FOP to
> accept arbitrarily large tables, without consuming massive amounts of
> heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes
> (table-cells must have at least one block each) that stay in memory
> until the page-sequence is completely finished. I wonder how many
> break-possibilities that generates... :/

Like said above, I don’t think anything in my approach prevents that 
from happening.

>> A matter of taste, probably, but I think I’d prefer the pull method: the
>> LM performs requests to the appropriate children LMs exactly when and if
>> needed.
> 
> The only thing an LM should initially pull/request from its children,
> AFAIU, is a list of elements, given a certain LayoutContext.
> When composing its own element list, an LM should ideally be able to
> rely on the lists it receives from its children. Then add/delete/update
> elements and (un)wrap, depending on context that is unknown or
> irrelevant to the child.

That one I don’t quite agree with. Although I thought of it too on 
a first step. I think it’s more complicated to play with a list of 
elements, try and get the first one from children if any, create a new 
one if necessary, change the break context (column, page) if needed, 
etc., rather than simply request the applicable children LMs for their 
break-before values. And again, in the case of tables that means that 
the merging algorithm needs to deal with many possibilities of break 
elements to occur. That’s really not its job I think.

>> That may simplify code as well (and improve its readability) as
>> some form of pull method is necessary anyway (the
>> mustKeepWithPrevious/WithNext/Together methods).
> 
> Keeps are a different story indeed. Big difference is that keeps have
> strengths, and breaks do not.

Yes, but keeps and breaks are handled at the same place, mainly, when 
a LM considers the stacking of children LMs (BlockLM, for example). And 
the treatment is very similar.

> Consider:
> 
> <fo:block id="b1">
>   ...
>   <fo:block id="b2">
>     <fo:block id="b3" keep-with-previous.within-page="...">
>       <fo:block id="b4">
>         <fo:block id="b5" break-before="page">
> 
> This may be interpretation: you cannot specify a 'strength' for a break.
> It is either there or not. I take this to mean that a forced break
> overrules any keep.

Indeed, Section 4.8 of XSL 1.1.

<snip/>
>> - the code would be about the same for Block- and InlineStackingLM
>> - we could factorize it into a common super-class
> 
> 
> AbstractStackingLM...?

Rather StackedLM. That’s the StackingLM that would have StackedLM 
children implementing the necessary methods. Ideally that would be an 
abstract class, but since multiple inheritance is impossible in Java I’m 
not sure that would be feasible. Hence static methods in a separate 
class.

> I kind of like the idea. For the really shared portions,
> AbstractStackingLM could then implement a set of static methods.
> 
>> but both those classes
>>   have subclasses to which breaks don’t apply (Flow-, StaticContentLM,
>>   for example).
> 
> I wouldn't really see this as a problem. The related methods will never
> be called, unless there is a flaw in our logic[*]. To stress the fact
> that they serve no purpose there, we could add overrides that always
> return false.

That really looks like bad design to me. If some methods don’t apply to 
an object then that object shouldn’t inherit the related 
class/interface.

> [*] (They won't be called, precisely because breaks don't apply?)
> 
>> OTOH keeps apply to AbstractGraphicsLM which doesn’t
>>   inherit any of those classes.
> 
> That's a special case, since in principle a graphic does not itself
> consist of more layout-objects that need to be stacked. To the
> layoutengine, a graphic is simply a monolithic box. Graphics are inline
> by definition nonetheless, so it could be InlineStackingLM with the same
> reservations as for FlowLM and StaticContentLM, but for other methods
> (the actual 'inline-stacking' can be considered to be delegated to the
> producer of the graphic, here).

So it’s not stacking, but stacked. Might make sense to introduce that 
concept, actually.

To sum up, my main concern is to find code for a same functionality at 
several places of the codebase. Some form of treatment is necessary at 
the LM level anyway, so why not just put all the code there? It may also 
be good to keep FO tree building code as much independent as possible 
from the layout code. Simpler, easier to understand, easier to debug, 
easier to replace a component which another one, etc. The collaborative 
approach you’re proposing looks interesting, but for practical reasons 
it may be best to keeps things separated. The codebase is quite large 
and it’s difficult to have a detailed understanding of all its parts. 
That might be good to be able to concentrate on just one part. Easier 
for newcomers, too.

WDYT?
Vincent

Re: Improving Keeps and Breaks

Posted by ge...@ogersoft.at.

Andreas L Delmelle schrieb:

> Triggering layout sooner is the only way we are ever going to get FOP to 
> accept arbitrarily large tables, without consuming massive amounts of 
> heap.

It is little OT, but multipass would be another way - at the cost of 
runtime and diskspace/-access (if the memory footprint should be shrinked).
I know it is not a issue now, but I sometimes think [1] of how the 
"formating objects for indexing" (section 6.10 of xsl 1.1) could be 
implemented and it always ends in a multipass solution because in 
contrast to page-number-citation it is hardly possible to guess how much 
space to reserve for the resulting index-page-citation-list.

Indexing would be a great help for my project - I do it currently by 
creating as much page-number-citations as needed but without beeing able 
to use any advanced features, so I end up with long lists of pagenumbers 
that could/should be merged to page-ranges. It is not realy a big 
problem because it is a "private" project, but I want to keep the door 
open for a better solution that is contained in the standard.

Nevertheless resolving as much as possible at the FO tree building stage 
and trigger an earlier layout process would be an advantage for most use 
cases.


gerhard

[1] Yes, "thinking" is all what I do at present ;-)))

Re: Improving Keeps and Breaks

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Oct 18, 2007, at 19:23, Vincent Hennebert wrote:

>> <snip />
>> OTOH, the above is semantically equivalent to (I think we had already
>> established that there should not be a double page-break here)
>>
>> <fo:block break-before="page">
>>   <fo:block>
>>     <fo:block>
>>
>> If the LMs would be guaranteed to receive the 'normalized' form, the
>> break-condition can be tested for internally by the outer LM  
>> itself. No
>> need to look forward or back... The first descendants wouldn't  
>> even need
>> to check for breaks anymore.
>
> I think I see your point. Basically you’re proposing a push method  
> (a LM
> notifies its parent LM that it has a break-before) while mine is a  
> pull
> method (a LM asks its children LMs if they have break-before).

Yep, although it would not be the LM but rather the FO that pushes  
the break-before upwards to its parent if it is also the first child.  
The LMs would largely continue to work as they do now, except that  
under a certain set of conditions, they don't need to check the  
outside anymore: only take into account the forced break on its own  
FO. If there is none, then no need to recursively check for first  
descendants having forced breaks.

Currently (sorry if it becomes boring to stress this) the  
construction of the layout-tree starts only when the end-of-page- 
sequence event occurs. I still see room for changing this in the  
future, and so I need to consider the effects on the layout-algorithm  
as well: the algorithm will, for instance, no longer be able to rely  
on *all* childLMs being available the first time it enters the  
loop... The last childLM in an iteration might turn out to be not-the- 
last-one-after-all. For many following FONodes, the LMs do not exist  
yet at that point. Not in my head, at least. ;-)

> You’re
> more at the FO tree building stage, I’m more at the layout stage. In
> terms of efficiency I think both methods are equivalent as the same
> amount of method calls will be performed in either way.

Right, but OTOH... it's more a matter of /when/ (in the process) that  
happens.

> The push method might be slighty more complicated to implement in
> special cases like tables: when an fo:cell notifies its parent
> fo:table-body that it has a break-before, the table-body must  
> figure out
> if the cell lies in the first row or not.

Almost everything is /slightly/ more complicated in case of  
fo:tables, especially those without explicit fo:table-rows or - 
columns. ;-)

Anyway, I remember that when I implemented implicit column-numbers, I  
also gave TableBody an instance member to check whether we are adding  
cells in the first row or not, so this particular case would be  
easily addressed. (Checking... yep, it's still there.)

Come to think of tables, I'd consider 'propagation' in terms of  
pushing a forced break on a cell to the first cell in the row.
In the table-layout code, at the point where we have a reference to  
the row or the first cell in a row, we would immediately know whether  
there is a forced break on a first descendant in any of the following  
sibling cells without having to request the corresponding childLMs  
and trigger a tree-traversal of who-knows-how-many levels.

Keeping in mind the above mentioned idea of triggering layout sooner,  
if we can guarantee that the layoutengine always receives complete  
rows, then the table-layout job should become a bit simpler in the  
general use-case, while still not adding much complexity in trickier,  
more exotic cases, like:
//table-cell/block[position() > 1][@break-before='page']

especially where the cell's column-number corresponds to the highest  
column-number.

Triggering layout sooner is the only way we are ever going to get FOP  
to accept arbitrarily large tables, without consuming massive amounts  
of heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes  
(table-cells must have at least one block each) that stay in memory  
until the page-sequence is completely finished. I wonder how many  
break-possibilities that generates... :/

>
> A matter of taste, probably, but I think I’d prefer the pull  
> method: the
> LM performs requests to the appropriate children LMs exactly when  
> and if
> needed.

The only thing an LM should initially pull/request from its children,  
AFAIU, is a list of elements, given a certain LayoutContext.
When composing its own element list, an LM should ideally be able to  
rely on the lists it receives from its children. Then add/delete/ 
update elements and (un)wrap, depending on context that is unknown or  
irrelevant to the child.

> That may simplify code as well (and improve its readability) as
> some form of pull method is necessary anyway (the
> mustKeepWithPrevious/WithNext/Together methods).

Keeps are a different story indeed. Big difference is that keeps have  
strengths, and breaks do not.

Consider:

<fo:block id="b1">
   ...
   <fo:block id="b2">
     <fo:block id="b3" keep-with-previous.within-page="...">
       <fo:block id="b4">
         <fo:block id="b5" break-before="page">

This may be interpretation: you cannot specify a 'strength' for a  
break. It is either there or not. I take this to mean that a forced  
break overrules any keep.

Main advantage to the layoutengine would be that forced breaks are  
known as early as possible: the break is either there, on the FO,  
when the LM is initialized --propagated upwards from a first child,  
maybe seven or eight levels down--, or it is not.
The above can be normalized at parse time, with only a marginal cost,  
so that the break is propagated upwards to block b2, and the keep is  
suppressed before any LM is even created.

> I believe you already mentioned this idea of normalizing/ 
> simplifying the
> FO tree in the past. Note that it may exist in parallel as it  
> addresses
> a different general issue. One concern I’d have is to make sure that
> a simplification leads to a semantically equivalent result.

That is precisely the purpose of normalization: to remove ambiguities  
at a point where it is still relatively simple. Ambiguities that  
would otherwise cause a significant amount of checks or tree- or list- 
traversals later on to get every possible scenario right. (FWIW: XEP  
also normalizes the input FO, but there it happens by means of an  
XSLT; IIRC, they normalize tables to always have columns and rows,  
for example; implicit column-numbers can also quite easily be  
computed/assigned as part of an XSL Transform)

> Given the complexity of the spec that might be difficult to  
> establish. Not sure
> also if the overhead is compensated by the gain in the further  
> processes
> (layout, area tree generation). But that’s a different topic.

The key advantage in the longer term is that the start of those  
further processes can be triggered sooner, without adding too much  
complexity to the related source code.

>> Agreed with the concerns, but I'm wondering if these portions of  
>> code,
>> instead of extracting them into a separate class, could be  
>> centralized
>> in, say, BlockStackingLM and InlineStackingLM...?
>
> I thought of that, but a separate class looked cleaner to me for some
> reasons:
> - the LMs classes are already overcrowded with many different concerns

True.

> - the code would be about the same for Block- and InlineStackingLM
> - we could factorize it into a common super-class

AbstractStackingLM...?

I kind of like the idea. For the really shared portions,  
AbstractStackingLM could then implement a set of static methods.

> but both those classes
>   have subclasses to which breaks don’t apply (Flow-, StaticContentLM,
>   for example).

I wouldn't really see this as a problem. The related methods will  
never be called, unless there is a flaw in our logic[*]. To stress  
the fact that they serve no purpose there, we could add overrides  
that always return false.

[*] (They won't be called, precisely because breaks don't apply?)

> OTOH keeps apply to AbstractGraphicsLM which doesn’t
>   inherit any of those classes.

That's a special case, since in principle a graphic does not itself  
consist of more layout-objects that need to be stacked. To the  
layoutengine, a graphic is simply a monolithic box. Graphics are  
inline by definition nonetheless, so it could be InlineStackingLM  
with the same reservations as for FlowLM and StaticContentLM, but for  
other methods (the actual 'inline-stacking' can be considered to be  
delegated to the producer of the graphic, here).

Cheers

Andreas

Re: Improving Keeps and Breaks

Posted by Vincent Hennebert <vi...@anyware-tech.com>.

Hi Andreas,

Andreas L Delmelle wrote:
> On Aug 2, 2007, at 12:13, Vincent Hennebert wrote:
> 
> Hi Vincent
> 
> Hope you still remember this one. Sorry about the late reply. Still
> catching up on some missed posts during the holidays.

Hope you still remember this one. Sorry about the late reply. Still
catching up on some missed posts while I was not working on FOP :-]

And thanks for your comments.

<snip/>
>> In fact, we don’t so much care about the penalty element itself as
>> whether there /should/ be a break between elements or not. I mean, the
>> LMs which actually care are those which concatenate the elements:
>> fo:flow, fo:block, fo:block-container, fo:table-cell, etc. And they all
>> do it the same way:
>>     iterate over the children LMs
>>     for each LM:
>>         if there is a following LM, then:
>>             if the current LM has break-after or the following LM has
>>             break-before, then
>>                 generate a Knuth forced break
>>
>> So the main question is to know whether there is a break before a LM.
>> And here that’s quite simple, there are only a few shared behaviours:
>> indeed there is a break-before on an element if:
>> - the break-before property is set on the element itself or the first of
>>   its children:
>>   fo:block, fo:block-container, fo:list-block
>> - it is set on the element itself or any of its children:
>>   fo:table-row, fo:list-item
>> - it is set on the first of its children:
>>   fo:table-caption, fo:table-cell, fo:list-item-body,
>>   fo:list-item-label, fo:wrapper
>> - special:
>>   - fo:table: on itself or first table-body
>>   - fo:table-body: on the first fo:table-row child if any; otherwise on
>>     any of fo:table-cell children making the first row
>> - not applicable:
>>   fo:table-column, fo:table-header/footer, fo:float, fo:footnote-body
> 
> I think I see another way of simplifying the necessary code in the LMs,
> but I'm not a 100% sure...
> 
> Currently, a break is not propagated upwards or normalized in any way in
> the fo-tree.
> 
> If you would have:
> 
> <fo:block>
>   <fo:block break-before="page">
>     <fo:block break-before="page">
> 
> Then we would get three nested Block instances, and the corresponding
> BlockLM for the first outer Block always needs to check its first
> childLM for the break-condition.
> 
> OTOH, the above is semantically equivalent to (I think we had already
> established that there should not be a double page-break here)
> 
> <fo:block break-before="page">
>   <fo:block>
>     <fo:block>
> 
> If the LMs would be guaranteed to receive the 'normalized' form, the
> break-condition can be tested for internally by the outer LM itself. No
> need to look forward or back... The first descendants wouldn't even need
> to check for breaks anymore.

I think I see your point. Basically you’re proposing a push method (a LM 
notifies its parent LM that it has a break-before) while mine is a pull 
method (a LM asks its children LMs if they have break-before). You’re 
more at the FO tree building stage, I’m more at the layout stage. In 
terms of efficiency I think both methods are equivalent as the same 
amount of method calls will be performed in either way.

The push method might be slighty more complicated to implement in 
special cases like tables: when an fo:cell notifies its parent 
fo:table-body that it has a break-before, the table-body must figure out 
if the cell lies in the first row or not.

A matter of taste, probably, but I think I’d prefer the pull method: the 
LM performs requests to the appropriate children LMs exactly when and if 
needed. That may simplify code as well (and improve its readability) as 
some form of pull method is necessary anyway (the 
mustKeepWithPrevious/WithNext/Together methods).

I believe you already mentioned this idea of normalizing/simplifying the 
FO tree in the past. Note that it may exist in parallel as it addresses 
a different general issue. One concern I’d have is to make sure that 
a simplification leads to a semantically equivalent result. Given the 
complexity of the spec that might be difficult to establish. Not sure 
also if the overhead is compensated by the gain in the further processes 
(layout, area tree generation). But that’s a different topic.


>> So there would just be a couple of methods to write, for each behaviour.
>> We could (for the moment) define a dedicated class with static methods,
>> which would be called by each LM (some methods are imaginary, but you
>> get the idea):
> <snip />
> 
> Interesting idea, too, but...
> 
>> The benefit would be that the whole handling of breaks can be found in
>> just one place, instead of being spread among all the LMs. This is
>> easier to correct bugs; this is easier to implement new features (column
>> vs page break, integer keeps...); this simplifies the
>> getNextKnuthElements methods of the LMs. And so on...
> 
> Agreed with the concerns, but I'm wondering if these portions of code,
> instead of extracting them into a separate class, could be centralized
> in, say, BlockStackingLM and InlineStackingLM...?

I thought of that, but a separate class looked cleaner to me for some 
reasons:
- the LMs classes are already overcrowded with many different concerns
- the code would be about the same for Block- and InlineStackingLM
- we could factorize it into a common super-class but both those classes 
  have subclasses to which breaks don’t apply (Flow-, StaticContentLM, 
  for example). OTOH keeps apply to AbstractGraphicsLM which doesn’t 
  inherit any of those classes.
- a separate class allows to easily change/improve/replace the 
  implementation
- and, anyway, it’s obviously better like this...


Vincent

Re: Improving Keeps and Breaks

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 2, 2007, at 12:13, Vincent Hennebert wrote:

Hi Vincent

Hope you still remember this one. Sorry about the late reply. Still  
catching up on some missed posts during the holidays.

> I’ve been thinking about the handling of keeps and breaks in tables  
> for
> a while, and it seems to me that improvements could be done in that
> whole area. I’ll use break-before as an example but what I’ll be  
> saying
> applies to break-after and keeps as well.

While I've also been looking into that direction myself, I'm not sure  
I agree with all of your suggestions (more inline below).

> Currently, for every object to which the break-before property  
> applies,
> the value of the property is checked at the beginning of the
> getNextKnuthElements method and, if it corresponds to a hard-break,
> a Knuth penalty with -inf value is produced. This has several
> shortcomings:
> - this leads to much code duplication;
> - if, say, an fo:block has another fo:block as a child and both have
>   break-before, two penalties will be generated instead of one  
> (although
>   I’m suddenly not so sure of that, more later)
> - this makes it difficult to improve breaks to distinguish columns  
> from
>   pages, and keeps to take integer values into account.

Very right about the above, and note that this is not only true for  
block-level breaks and keeps (= page- or column-breaks), but also  
inline-level, where at the moment, the situation is worse. keep- 
together.within-line="always" only works for text-descendants of the  
same FObj. Forcing a keep-with-previous on a graphic or an fo:inline  
to have it always kept on the same line as the last line generated by  
preceding PCDATA is currently impossible with FOP.

> In fact, we don’t so much care about the penalty element itself as
> whether there /should/ be a break between elements or not. I mean, the
> LMs which actually care are those which concatenate the elements:
> fo:flow, fo:block, fo:block-container, fo:table-cell, etc. And they  
> all
> do it the same way:
>     iterate over the children LMs
>     for each LM:
>         if there is a following LM, then:
>             if the current LM has break-after or the following LM has
>             break-before, then
>                 generate a Knuth forced break
>
> So the main question is to know whether there is a break before a LM.
> And here that’s quite simple, there are only a few shared behaviours:
> indeed there is a break-before on an element if:
> - the break-before property is set on the element itself or the  
> first of
>   its children:
>   fo:block, fo:block-container, fo:list-block
> - it is set on the element itself or any of its children:
>   fo:table-row, fo:list-item
> - it is set on the first of its children:
>   fo:table-caption, fo:table-cell, fo:list-item-body,
>   fo:list-item-label, fo:wrapper
> - special:
>   - fo:table: on itself or first table-body
>   - fo:table-body: on the first fo:table-row child if any;  
> otherwise on
>     any of fo:table-cell children making the first row
> - not applicable:
>   fo:table-column, fo:table-header/footer, fo:float, fo:footnote-body

I think I see another way of simplifying the necessary code in the  
LMs, but I'm not a 100% sure...

Currently, a break is not propagated upwards or normalized in any way  
in the fo-tree.

If you would have:

<fo:block>
   <fo:block break-before="page">
     <fo:block break-before="page">

Then we would get three nested Block instances, and the corresponding  
BlockLM for the first outer Block always needs to check its first  
childLM for the break-condition.

OTOH, the above is semantically equivalent to (I think we had already  
established that there should not be a double page-break here)

<fo:block break-before="page">
   <fo:block>
     <fo:block>

If the LMs would be guaranteed to receive the 'normalized' form, the  
break-condition can be tested for internally by the outer LM itself.  
No need to look forward or back... The first descendants wouldn't  
even need to check for breaks anymore.

> So there would just be a couple of methods to write, for each  
> behaviour.
> We could (for the moment) define a dedicated class with static  
> methods,
> which would be called by each LM (some methods are imaginary, but you
> get the idea):
<snip />

Interesting idea, too, but...

> The benefit would be that the whole handling of breaks can be found in
> just one place, instead of being spread among all the LMs. This is
> easier to correct bugs; this is easier to implement new features  
> (column
> vs page break, integer keeps...); this simplifies the
> getNextKnuthElements methods of the LMs. And so on...

Agreed with the concerns, but I'm wondering if these portions of  
code, instead of extracting them into a separate class, could be  
centralized in, say, BlockStackingLM and InlineStackingLM...?

Cheers

Andreas