You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2005/09/20 23:50:12 UTC
Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Hi,
Jeremias, Luca or Simon will probably be able to make the most sense
out of it, but if there's anyone else that can add a few comments, feel
free to do so.
(FYI: This is completely separate from my idea to move the
border-collapsing to the FOTree.)
Now, I'm still not fully at home in the Knuth element generation
algorithm, so I don't know exactly whether what I'm about to describe
is at all feasible/doable. Maybe it's currently already done this way,
and I'm missing the point somewhere... In that case: sorry for the
noise. :-/
Here goes:
I get the impression that the elements for borders and those for the
content of the cells are created in one single pass, which seems to be
the source of the so-called 'interaction problem' --IIC, this refers to
the situation where, for example, we have already generated the AFTER
border elements for the first two cells, while it's only when
generating the elements for the third cell that a break is triggered.
So, the obtained border- and content-elements become invalid, and need
to be re-evaluated (possibly taking the footer into account).
Is this a correct assessment of the issue?
Am I correct when I say that this problem doesn't pose itself when the
break would occur in the first cell of the row(group)?
If so, I'm wondering whether it could help if the element generation
for row(groups) were split up in two (possibly three passes) and be
made to look like the following (in pseudo-code):
while( rowIterator.hasNext() ) {
if( firstRowGroupInPageOrColumn ) {
generateBeforeBorderElements();
}
generateAfterBorderElements();
generateContentElements();
}
So, by the time we get to generating boxes/glues/penalties for the
content of the cells, we would already have the minimum/maximum widths
for *all* possible AFTER border elements in the row.
The generateAfterBorderElements() step would create two element lists:
- one to use if there is no page- or column-break
- an alternate list to use in case the content triggers a break (which
would then include all elements for the footer, if any)
Maybe both lists could be made to include the elements for the AFTER
padding as well (? since we have to iterate over the cells/grid-units
anyway).
Eventually only one of the two lists will be merged with the content
element list, depending on the situation after the content element list
completely known, but it would become a matter of inserting the right
list (and discarding the incorrect one --at least, throwing away its
elements).
The only drawback I immediately see is that the
generateAfterBorderElements() step would have to make the comparison
with the footer- or table-borders for each and every row, unless we
were to do this only in case the remaining page- or column-BPD has
dropped below a certain threshold.
The only remaining problems would then be that:
a) there may be row(groups) whose content is so large that the
remaining BPD is more than enough before the content's elements are
generated, but only drops below the threshold during the
generateContentElements() step.
b) there's always the possibility of a forced break, regardless of the
remaining BPD
The creation of the alternate element list should therefore be
implemented as a separate step that can be triggered either during
generateAfterBorderElements() or generateContentElements().
In any case, besides gaining certainty about min- or max-border-widths,
splitting up the element generation in 2-3 passes would allow us to
gain a few hints on the content to get an idea of the probability of a
page- or column-break.
I mean: without actually triggering creation of a full element list for
the content, we could maybe do a quick traverse of the FOTree-fragment
contained in each cell to see if any of its descendants have a break-*
property specified.
To make an even more educated guess, perhaps we could even perform some
off-hand calculations based on the average font-size, the number of
blocks, the number of characters of the descendant FOText nodes, the
content-height for contained images... But this all *without*
generating the elements. Only minimal communication with the actual
childLMs in that step, placing the focus on the FONode-elements (= the
list returned by TableCell.getChildNodes()) and their properties.
Does this make any sense?
Cheers,
Andreas
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 21 Sep 2005 04:57 pm, Jeremias Maerki wrote:
> On 21.09.2005 09:52:00 Manuel Mall wrote:
> > On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote:
>
> <snip/>
>
> >
> > Jeremias, can you explain to me why we have to reevaluate?
>
> Let me explain by showing the flow of events for a simple block with
> a long text in it:
>
<snip/>
> It's not that immediate as the LineLM has to do the line breaking
> before the page breaking can be done.
Ok, I see we do all the line breaking first followed by the page
breaking therefore the Line LM when creating a break as no idea that
this may or may not become a page break and "happily" continues line
breaking with the given ipd.
>
> > Yes, the question on the
> > new IPD when ask of a LM may have to "ripple up" the LM chain until
> > we get to a LM which can actually answer it. But is that
> > conceptually flawed?
>
> It doesn't work that way with the Knuth approach.
Yes, I see because we use a two pass approach line breaking and page
breaking use different set of Knuth elements.
>
> HTH
>
Yes it did, thanks.
> Jeremias Maerki
Manuel
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 21.09.2005 09:52:00 Manuel Mall wrote:
> On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote:
<snip/>
> > The only
> > reevaluation will happen if we start to implement support for the
> > "changing available IPD" problem, i.e. when the available IPD is
> > different from page to page within the same page-sequence. In this
> > case we will need to be able to recreate the element list from an
> > arbitrary former break possibility on forward which means that all
> > decisions are reevaluated from this point on due to changed
> > environmental factors (the IPD). Even the line-breaking has to be
> > redone, although the inline element list will not have to be
> > recreated.
> >
>
> Jeremias, can you explain to me why we have to reevaluate?
Let me explain by showing the flow of events for a simple block with a
long text in it:
- BlockLM.getNextKnuthElements is called indirectly by the PSLM which
provides the available IPD in the layout context.
- The BlockLM calls on the LineLM to do the line-breaking passing on the
available IPD.
- The LineLM creates the element list for its content. (IPD is
irrelevant here)
- The LineLM does the line breaking (by using the IPD value in the
layout context) and creates a box element for each created line and
penalties between lines.
- The BlockLM receives the element list from the LineLM and integrates
it into it own element list.
- The resulting element list is returned to the parents until we're back
in the PSLM which invokes the breaker. All break decisions for the whole
sequence are generated at this point.
- Let's assume the contents don't fit in the first page and we get at
least one break point. Let's assume further that the first page is an A4
portrait page and the second is an A4 landscape page, i.e. the available
IPD is bigger on the second page.
- The first page is generated normally, all lines are properly generated.
- The second page has the problem that the available IPD is different,
but the line breaks have all been done for the IPD of the first page,
because the LineLM has no chance of knowing on which page it will end up.
because the break decisions in b-p-direction are done later.
- Now the layout has to be backtracked to the point of the first break
after which the available IPD is different. From there on the lines have
to be rebroken to get line boxes which work with the right IPD. Note
that the element list for the inline stuff doesn't have to be recreated.
Only the line breaker gets a new available IPD. Due to different break
decisions you get a different set of b-p-d elements which also have to
be broken over the pages.
It might be possible to optimize the amount of lines that are precreated
to avoid unnecessary work in these conditions but these will be
heuristics and guesses and most of all only approximations. At the very
least this will get messy. The biggest problem will be tables whose LMs
have to be made restartable [1]. The other LM will probably be easier.
We used to have a restart mechanism before introducing the Knuth
approach. Something like that will need to be reintroduced at some point.
[1] http://wiki.apache.org/xmlgraphics-fop/TableLayout/BreakHandling
> Can't the
> line breaking code simply ask the LM for the new IPD when it inserts a
> page break and then continue with the new IPD?
It's not that immediate as the LineLM has to do the line breaking before
the page breaking can be done.
> Yes, the question on the
> new IPD when ask of a LM may have to "ripple up" the LM chain until we
> get to a LM which can actually answer it. But is that conceptually
> flawed?
It doesn't work that way with the Knuth approach.
HTH
Jeremias Maerki
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote:
> On 20.09.2005 23:50:12 Andreas L Delmelle wrote:
> > Hi,
> >
<snip/>
> >
> > Here goes:
> > I get the impression that the elements for borders and those for
> > the content of the cells are created in one single pass, which
> > seems to be the source of the so-called 'interaction problem'
> > --IIC, this refers to the situation where, for example, we have
> > already generated the AFTER border elements for the first two
> > cells, while it's only when generating the elements for the third
> > cell that a break is triggered. So, the obtained border- and
> > content-elements become invalid, and need to be re-evaluated
> > (possibly taking the footer into account). Is this a correct
> > assessment of the issue?
>
> Unfortunately not. I get the impression that you haven't understood,
> yet, how the Knuth approach works. We don't reevaluate any decisions
> in this approach, but rather calculate ALL(!) possible decisions
> beforehand and incorporate them into the element list we generate.
> The breaker will merely choose a break possibility and the addAreas
> stage will paint the results given the break decision.
Andreas, FWIW I struggled with a similar misconception when doing the
conditional borders on inline elements (with is a very simplified
version of of your problem). Luckily Luca set me straight very quickly.
At every break possibility (space, hyphen, hyphenation point, ... for
the ipd) you have to add all the Knuth elements required to model a)
what happens if a break is inserted and b) what happens if no break is
created. Amazingly the Knuth approach of just using box, penalty and
glue elements can handle that. However, the only thing the Knuth
elements and the line breaking algorithm do is to reserve the correct
amount of space. Adding the actual borders is done when the areas are
created.
> The only
> reevaluation will happen if we start to implement support for the
> "changing available IPD" problem, i.e. when the available IPD is
> different from page to page within the same page-sequence. In this
> case we will need to be able to recreate the element list from an
> arbitrary former break possibility on forward which means that all
> decisions are reevaluated from this point on due to changed
> environmental factors (the IPD). Even the line-breaking has to be
> redone, although the inline element list will not have to be
> recreated.
>
Jeremias, can you explain to me why we have to reevaluate? Can't the
line breaking code simply ask the LM for the new IPD when it inserts a
page break and then continue with the new IPD? Yes, the question on the
new IPD when ask of a LM may have to "ripple up" the LM chain until we
get to a LM which can actually answer it. But is that conceptually
flawed?
<snip/>
> Jeremias Maerki
Manuel
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Sep 21, 2005, Jeremias & Manuel wrote:
<snip />
Aaaah, now I'm finally beginning to see it...
Thanks a lot to the both of you for these clarifications!
Cheers,
Andreas
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 20.09.2005 23:50:12 Andreas L Delmelle wrote:
> Hi,
>
> Jeremias, Luca or Simon will probably be able to make the most sense
> out of it, but if there's anyone else that can add a few comments, feel
> free to do so.
> (FYI: This is completely separate from my idea to move the
> border-collapsing to the FOTree.)
>
> Now, I'm still not fully at home in the Knuth element generation
> algorithm, so I don't know exactly whether what I'm about to describe
> is at all feasible/doable. Maybe it's currently already done this way,
> and I'm missing the point somewhere... In that case: sorry for the
> noise. :-/
>
> Here goes:
> I get the impression that the elements for borders and those for the
> content of the cells are created in one single pass, which seems to be
> the source of the so-called 'interaction problem' --IIC, this refers to
> the situation where, for example, we have already generated the AFTER
> border elements for the first two cells, while it's only when
> generating the elements for the third cell that a break is triggered.
> So, the obtained border- and content-elements become invalid, and need
> to be re-evaluated (possibly taking the footer into account).
> Is this a correct assessment of the issue?
Unfortunately not. I get the impression that you haven't understood, yet,
how the Knuth approach works. We don't reevaluate any decisions in this
approach, but rather calculate ALL(!) possible decisions beforehand and
incorporate them into the element list we generate. The breaker will
merely choose a break possibility and the addAreas stage will paint the
results given the break decision. The only reevaluation will happen if
we start to implement support for the "changing available IPD" problem,
i.e. when the available IPD is different from page to page within the
same page-sequence. In this case we will need to be able to recreate the
element list from an arbitrary former break possibility on forward which
means that all decisions are reevaluated from this point on due to
changed environmental factors (the IPD). Even the line-breaking has to
be redone, although the inline element list will not have to be
recreated.
This calculation of all possible decisions when generating the element
list is exactly the same problem I'm currently facing with space
resolution. I have to precalculate all space resolution scenarios for
every single break possibility in order to be able to create the right
element list. Mind-breaking, I tell you...... :-)
> Am I correct when I say that this problem doesn't pose itself when the
> break would occur in the first cell of the row(group)?
>
> If so, I'm wondering whether it could help if the element generation
> for row(groups) were split up in two (possibly three passes) and be
> made to look like the following (in pseudo-code):
>
> while( rowIterator.hasNext() ) {
> if( firstRowGroupInPageOrColumn ) {
> generateBeforeBorderElements();
> }
> generateAfterBorderElements();
> generateContentElements();
> }
>
> So, by the time we get to generating boxes/glues/penalties for the
> content of the cells, we would already have the minimum/maximum widths
> for *all* possible AFTER border elements in the row.
> The generateAfterBorderElements() step would create two element lists:
> - one to use if there is no page- or column-break
> - an alternate list to use in case the content triggers a break (which
> would then include all elements for the footer, if any)
I don't think something like that is possible. During my analysis I
found that the effective borders influence the Knuth element generation
a lot. You can't separate the borders from the content. Have a look at
the notes in the Wiki. They show this interaction. It's all documented
there. The element list generation is fully implemented for the separate
border model. For the collapsing border model, several examples are
documented and fully calculated. The only thing left is the algorithm to
handle all the little difficulties arising from the collapsing border
model. The most important pages for implementing the collapsing border
model are these:
http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/RowBorder
http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/RowBorder2
http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/HfIntegrationInSteppingAlgorithm
> Maybe both lists could be made to include the elements for the AFTER
> padding as well (? since we have to iterate over the cells/grid-units
> anyway).
>
> Eventually only one of the two lists will be merged with the content
> element list, depending on the situation after the content element list
> completely known, but it would become a matter of inserting the right
> list (and discarding the incorrect one --at least, throwing away its
> elements).
>
> The only drawback I immediately see is that the
> generateAfterBorderElements() step would have to make the comparison
> with the footer- or table-borders for each and every row, unless we
> were to do this only in case the remaining page- or column-BPD has
> dropped below a certain threshold.
>
> The only remaining problems would then be that:
> a) there may be row(groups) whose content is so large that the
> remaining BPD is more than enough before the content's elements are
> generated, but only drops below the threshold during the
> generateContentElements() step.
> b) there's always the possibility of a forced break, regardless of the
> remaining BPD
>
> The creation of the alternate element list should therefore be
> implemented as a separate step that can be triggered either during
> generateAfterBorderElements() or generateContentElements().
>
> In any case, besides gaining certainty about min- or max-border-widths,
> splitting up the element generation in 2-3 passes would allow us to
> gain a few hints on the content to get an idea of the probability of a
> page- or column-break.
> I mean: without actually triggering creation of a full element list for
> the content, we could maybe do a quick traverse of the FOTree-fragment
> contained in each cell to see if any of its descendants have a break-*
> property specified.
> To make an even more educated guess, perhaps we could even perform some
> off-hand calculations based on the average font-size, the number of
> blocks, the number of characters of the descendant FOText nodes, the
> content-height for contained images... But this all *without*
> generating the elements. Only minimal communication with the actual
> childLMs in that step, placing the focus on the FONode-elements (= the
> list returned by TableCell.getChildNodes()) and their properties.
>
>
> Does this make any sense?
Hmmmmmm. Unless I'm totally mistaken, you're off-course, unfortunately.
Jeremias Maerki