You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2005/09/20 23:50:12 UTC

Collapsing borders/Tables: Knuth element generation questions (possible ideas?)

Hi,

Jeremias, Luca or Simon will probably be able to make the most sense 
out of it, but if there's anyone else that can add a few comments, feel 
free to do so.
(FYI: This is completely separate from my idea to move the 
border-collapsing to the FOTree.)

Now, I'm still not fully at home in the Knuth element generation 
algorithm, so I don't know exactly whether what I'm about to describe 
is at all feasible/doable. Maybe it's currently already done this way, 
and I'm missing the point somewhere... In that case: sorry for the 
noise. :-/

Here goes:
I get the impression that the elements for borders and those for the 
content of the cells are created in one single pass, which seems to be 
the source of the so-called 'interaction problem' --IIC, this refers to 
the situation where, for example, we have already generated the AFTER 
border elements for the first two cells, while it's only when 
generating the elements for the third cell that a break is triggered. 
So, the obtained border- and content-elements become invalid, and need 
to be re-evaluated (possibly taking the footer into account).
Is this a correct assessment of the issue?
Am I correct when I say that this problem doesn't pose itself when the 
break would occur in the first cell of the row(group)?

If so, I'm wondering whether it could help if the element generation 
for row(groups) were split up in two (possibly three passes) and be 
made to look like the following (in pseudo-code):

while( rowIterator.hasNext() ) {
   if( firstRowGroupInPageOrColumn ) {
     generateBeforeBorderElements();
   }
   generateAfterBorderElements();
   generateContentElements();
}

So, by the time we get to generating boxes/glues/penalties for the 
content of the cells, we would already have the minimum/maximum widths 
for *all* possible AFTER border elements in the row.
The generateAfterBorderElements() step would create two element lists:
- one to use if there is no page- or column-break
- an alternate list to use in case the content triggers a break (which 
would then include all elements for the footer, if any)

Maybe both lists could be made to include the elements for the AFTER 
padding as well (? since we have to iterate over the cells/grid-units 
anyway).

Eventually only one of the two lists will be merged with the content 
element list, depending on the situation after the content element list 
completely known, but it would become a matter of inserting the right 
list (and discarding the incorrect one --at least, throwing away its 
elements).

The only drawback I immediately see is that the 
generateAfterBorderElements() step would have to make the comparison 
with the footer- or table-borders for each and every row, unless we 
were to do this only in case the remaining page- or column-BPD has 
dropped below a certain threshold.

The only remaining problems would then be that:
a) there may be row(groups) whose content is so large that the 
remaining BPD is more than enough before the content's elements are 
generated, but only drops below the threshold during the 
generateContentElements() step.
b) there's always the possibility of a forced break, regardless of the 
remaining BPD

The creation of the alternate element list should therefore be 
implemented as a separate step that can be triggered either during 
generateAfterBorderElements() or generateContentElements().

In any case, besides gaining certainty about min- or max-border-widths, 
splitting up the element generation in 2-3 passes would allow us to 
gain a few hints on the content to get an idea of the probability of a 
page- or column-break.
I mean: without actually triggering creation of a full element list for 
the content, we could maybe do a quick traverse of the FOTree-fragment 
contained in each cell to see if any of its descendants have a break-* 
property specified.
To make an even more educated guess, perhaps we could even perform some 
off-hand calculations based on the average font-size, the number of 
blocks, the number of characters of the descendant FOText nodes, the 
content-height for contained images... But this all *without* 
generating the elements. Only minimal communication with the actual 
childLMs in that step, placing the focus on the FONode-elements (= the 
list returned by TableCell.getChildNodes()) and their properties.


Does this make any sense?


Cheers,

Andreas

Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)

Posted by Manuel Mall <mm...@arcus.com.au>.

On Wed, 21 Sep 2005 04:57 pm, Jeremias Maerki wrote:
> On 21.09.2005 09:52:00 Manuel Mall wrote:
> > On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote:
>
> <snip/>
>
> >
> > Jeremias, can you explain to me why we have to reevaluate?
>
> Let me explain by showing the flow of events for a simple block with
> a long text in it:
>
<snip/>

> It's not that immediate as the LineLM has to do the line breaking
> before the page breaking can be done.

Ok, I see we do all the line breaking first followed by the page 
breaking therefore the Line LM when creating a break as no idea that 
this may or may not become a page break and "happily" continues line 
breaking with the given ipd.

>
> > Yes, the question on the
> > new IPD when ask of a LM may have to "ripple up" the LM chain until
> > we get to a LM which can actually answer it. But is that
> > conceptually flawed?
>
> It doesn't work that way with the Knuth approach.

Yes, I see because we use a two pass approach line breaking and page 
breaking use different set of Knuth elements.

>
> HTH
>
Yes it did, thanks.

> Jeremias Maerki

Manuel

Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

On 21.09.2005 09:52:00 Manuel Mall wrote:
> On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote:
<snip/>
> > The only 
> > reevaluation will happen if we start to implement support for the
> > "changing available IPD" problem, i.e. when the available IPD is
> > different from page to page within the same page-sequence. In this
> > case we will need to be able to recreate the element list from an
> > arbitrary former break possibility on forward which means that all
> > decisions are reevaluated from this point on due to changed
> > environmental factors (the IPD). Even the line-breaking has to be
> > redone, although the inline element list will not have to be
> > recreated.
> >
> 
> Jeremias, can you explain to me why we have to reevaluate? 

Let me explain by showing the flow of events for a simple block with a
long text in it:

- BlockLM.getNextKnuthElements is called indirectly by the PSLM which
provides the available IPD in the layout context.
- The BlockLM calls on the LineLM to do the line-breaking passing on the
available IPD.
- The LineLM creates the element list for its content. (IPD is
irrelevant here)
- The LineLM does the line breaking (by using the IPD value in the
layout context) and creates a box element for each created line and
penalties between lines.
- The BlockLM receives the element list from the LineLM and integrates
it into it own element list.
- The resulting element list is returned to the parents until we're back
in the PSLM which invokes the breaker. All break decisions for the whole
sequence are generated at this point.
- Let's assume the contents don't fit in the first page and we get at
least one break point. Let's assume further that the first page is an A4
portrait page and the second is an A4 landscape page, i.e. the available
IPD is bigger on the second page.
- The first page is generated normally, all lines are properly generated.
- The second page has the problem that the available IPD is different,
but the line breaks have all been done for the IPD of the first page,
because the LineLM has no chance of knowing on which page it will end up.
because the break decisions in b-p-direction are done later.
- Now the layout has to be backtracked to the point of the first break
after which the available IPD is different. From there on the lines have
to be rebroken to get line boxes which work with the right IPD. Note
that the element list for the inline stuff doesn't have to be recreated.
Only the line breaker gets a new available IPD. Due to different break
decisions you get a different set of b-p-d elements which also have to
be broken over the pages.

It might be possible to optimize the amount of lines that are precreated
to avoid unnecessary work in these conditions but these will be
heuristics and guesses and most of all only approximations. At the very
least this will get messy. The biggest problem will be tables whose LMs
have to be made restartable [1]. The other LM will probably be easier.
We used to have a restart mechanism before introducing the Knuth
approach. Something like that will need to be reintroduced at some point.

[1] http://wiki.apache.org/xmlgraphics-fop/TableLayout/BreakHandling

> Can't the 
> line breaking code simply ask the LM for the new IPD when it inserts a 
> page break and then continue with the new IPD? 

It's not that immediate as the LineLM has to do the line breaking before
the page breaking can be done.

> Yes, the question on the 
> new IPD when ask of a LM may have to "ripple up" the LM chain until we 
> get to a LM which can actually answer it. But is that conceptually 
> flawed?

It doesn't work that way with the Knuth approach.

HTH

Jeremias Maerki

Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)

Posted by Manuel Mall <mm...@arcus.com.au>.

On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote:
> On 20.09.2005 23:50:12 Andreas L Delmelle wrote:
> > Hi,
> >
<snip/>
> >
> > Here goes:
> > I get the impression that the elements for borders and those for
> > the content of the cells are created in one single pass, which
> > seems to be the source of the so-called 'interaction problem'
> > --IIC, this refers to the situation where, for example, we have
> > already generated the AFTER border elements for the first two
> > cells, while it's only when generating the elements for the third
> > cell that a break is triggered. So, the obtained border- and
> > content-elements become invalid, and need to be re-evaluated
> > (possibly taking the footer into account). Is this a correct
> > assessment of the issue?
>
> Unfortunately not. I get the impression that you haven't understood,
> yet, how the Knuth approach works. We don't reevaluate any decisions
> in this approach, but rather calculate ALL(!) possible decisions
> beforehand and incorporate them into the element list we generate.
> The breaker will merely choose a break possibility and the addAreas
> stage will paint the results given the break decision. 

Andreas, FWIW I struggled with a similar misconception when doing the 
conditional borders on inline elements (with is a very simplified 
version of of your problem). Luckily Luca set me straight very quickly. 
At every break possibility (space, hyphen, hyphenation point, ... for 
the ipd) you have to add all the Knuth elements required to model a) 
what happens if a break is inserted and b) what happens if no break is 
created. Amazingly the Knuth approach of just using box, penalty and 
glue elements can handle that. However, the only thing the Knuth 
elements and the line breaking algorithm do is to reserve the correct 
amount of space. Adding the actual borders is done when the areas are 
created.

> The only 
> reevaluation will happen if we start to implement support for the
> "changing available IPD" problem, i.e. when the available IPD is
> different from page to page within the same page-sequence. In this
> case we will need to be able to recreate the element list from an
> arbitrary former break possibility on forward which means that all
> decisions are reevaluated from this point on due to changed
> environmental factors (the IPD). Even the line-breaking has to be
> redone, although the inline element list will not have to be
> recreated.
>

Jeremias, can you explain to me why we have to reevaluate? Can't the 
line breaking code simply ask the LM for the new IPD when it inserts a 
page break and then continue with the new IPD? Yes, the question on the 
new IPD when ask of a LM may have to "ripple up" the LM chain until we 
get to a LM which can actually answer it. But is that conceptually 
flawed?

<snip/>
> Jeremias Maerki

Manuel

Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Sep 21, 2005, Jeremias & Manuel wrote:

<snip  />

Aaaah, now I'm finally beginning to see it...

Thanks a lot to the both of you for these clarifications!


Cheers,

Andreas

Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

On 20.09.2005 23:50:12 Andreas L Delmelle wrote:
> Hi,
> 
> Jeremias, Luca or Simon will probably be able to make the most sense 
> out of it, but if there's anyone else that can add a few comments, feel 
> free to do so.
> (FYI: This is completely separate from my idea to move the 
> border-collapsing to the FOTree.)
> 
> Now, I'm still not fully at home in the Knuth element generation 
> algorithm, so I don't know exactly whether what I'm about to describe 
> is at all feasible/doable. Maybe it's currently already done this way, 
> and I'm missing the point somewhere... In that case: sorry for the 
> noise. :-/
> 
> Here goes:
> I get the impression that the elements for borders and those for the 
> content of the cells are created in one single pass, which seems to be 
> the source of the so-called 'interaction problem' --IIC, this refers to 
> the situation where, for example, we have already generated the AFTER 
> border elements for the first two cells, while it's only when 
> generating the elements for the third cell that a break is triggered. 
> So, the obtained border- and content-elements become invalid, and need 
> to be re-evaluated (possibly taking the footer into account).
> Is this a correct assessment of the issue?

Unfortunately not. I get the impression that you haven't understood, yet,
how the Knuth approach works. We don't reevaluate any decisions in this
approach, but rather calculate ALL(!) possible decisions beforehand and
incorporate them into the element list we generate. The breaker will
merely choose a break possibility and the addAreas stage will paint the
results given the break decision. The only reevaluation will happen if
we start to implement support for the "changing available IPD" problem,
i.e. when the available IPD is different from page to page within the
same page-sequence. In this case we will need to be able to recreate the
element list from an arbitrary former break possibility on forward which
means that all decisions are reevaluated from this point on due to
changed environmental factors (the IPD). Even the line-breaking has to
be redone, although the inline element list will not have to be
recreated.

This calculation of all possible decisions when generating the element
list is exactly the same problem I'm currently facing with space
resolution. I have to precalculate all space resolution scenarios for
every single break possibility in order to be able to create the right
element list. Mind-breaking, I tell you...... :-)

> Am I correct when I say that this problem doesn't pose itself when the 
> break would occur in the first cell of the row(group)?
>
> If so, I'm wondering whether it could help if the element generation 
> for row(groups) were split up in two (possibly three passes) and be 
> made to look like the following (in pseudo-code):
> 
> while( rowIterator.hasNext() ) {
>    if( firstRowGroupInPageOrColumn ) {
>      generateBeforeBorderElements();
>    }
>    generateAfterBorderElements();
>    generateContentElements();
> }
> 
> So, by the time we get to generating boxes/glues/penalties for the 
> content of the cells, we would already have the minimum/maximum widths 
> for *all* possible AFTER border elements in the row.
> The generateAfterBorderElements() step would create two element lists:
> - one to use if there is no page- or column-break
> - an alternate list to use in case the content triggers a break (which 
> would then include all elements for the footer, if any)

I don't think something like that is possible. During my analysis I
found that the effective borders influence the Knuth element generation
a lot. You can't separate the borders from the content. Have a look at
the notes in the Wiki. They show this interaction. It's all documented
there. The element list generation is fully implemented for the separate
border model. For the collapsing border model, several examples are
documented and fully calculated. The only thing left is the algorithm to
handle all the little difficulties arising from the collapsing border
model. The most important pages for implementing the collapsing border
model are these:
http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/RowBorder
http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/RowBorder2
http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/HfIntegrationInSteppingAlgorithm

> Maybe both lists could be made to include the elements for the AFTER 
> padding as well (? since we have to iterate over the cells/grid-units 
> anyway).
> 
> Eventually only one of the two lists will be merged with the content 
> element list, depending on the situation after the content element list 
> completely known, but it would become a matter of inserting the right 
> list (and discarding the incorrect one --at least, throwing away its 
> elements).
> 
> The only drawback I immediately see is that the 
> generateAfterBorderElements() step would have to make the comparison 
> with the footer- or table-borders for each and every row, unless we 
> were to do this only in case the remaining page- or column-BPD has 
> dropped below a certain threshold.
> 
> The only remaining problems would then be that:
> a) there may be row(groups) whose content is so large that the 
> remaining BPD is more than enough before the content's elements are 
> generated, but only drops below the threshold during the 
> generateContentElements() step.
> b) there's always the possibility of a forced break, regardless of the 
> remaining BPD
> 
> The creation of the alternate element list should therefore be 
> implemented as a separate step that can be triggered either during 
> generateAfterBorderElements() or generateContentElements().
> 
> In any case, besides gaining certainty about min- or max-border-widths, 
> splitting up the element generation in 2-3 passes would allow us to 
> gain a few hints on the content to get an idea of the probability of a 
> page- or column-break.
> I mean: without actually triggering creation of a full element list for 
> the content, we could maybe do a quick traverse of the FOTree-fragment 
> contained in each cell to see if any of its descendants have a break-* 
> property specified.
> To make an even more educated guess, perhaps we could even perform some 
> off-hand calculations based on the average font-size, the number of 
> blocks, the number of characters of the descendant FOText nodes, the 
> content-height for contained images... But this all *without* 
> generating the elements. Only minimal communication with the actual 
> childLMs in that step, placing the focus on the FONode-elements (= the 
> list returned by TableCell.getChildNodes()) and their properties.
> 
> 
> Does this make any sense?

Hmmmmmm. Unless I'm totally mistaken, you're off-course, unfortunately.

Jeremias Maerki