You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2006/09/01 01:21:22 UTC

Re: Implementation of hyphenation-keep property

On Aug 31, 2006, at 20:59, Jeremias Maerki wrote:

Yeah, it was a lot, wasn't it? :)
Actually, I was preparing the post myself as yours came in, so I  
decided to c&p it into a reply, since it did seem to address the same  
basic issue: interaction between line- and page-breaking.

> What I can deduct from this is that my suspicion is probably correct
> that implementating hyphenation-keep will be quite tricky with the
> current code. I assume we have to do a few changes to make page- und
> line-breaking interact more closely (for "changing available IPD"  
> etc.).

Looking closer at hyphenation-keep: this indeed seems very tricky in  
the current situation.

I got the idea while debugging the behavior when processing the  
disabled testcase 'page-breaking_4.xml'.
Notice that the FlowLM's getNextKnuthElements() is currently only  
called once, which triggers line-breaking for the entire page- 
sequence given a LayoutContext with ipd equal to that of the first  
page's region-body. The second page is only prepared after all line- 
breaks and the first page-break have been computed.

Note that this results in optimal line-layout for non-paginated  
media. It would be perfect for an HTML page (or a page with  
indefinite height) to compute all line-breaks in one go. Given the  
current page-breaking algorithm, which performs outstandingly, nobody  
will notice a thing if all pages use the same page-master. From the  
moment you add a second one with a slightly narrower/wider region- 
body, you're in trouble, it seems. :/

Taking into account the possibility of varying ipd due to different  
page-masters or deferred side-floats... this definitely needs to be  
changed.

As a side-note, looking at memory consumption: it would at least  
offer a chance to perform cleanup if the algorithm jumps from page- 
layout to line-layout and back.
Looking at computational complexity: it remains a hypothesis FTM, but  
I'm guessing that in certain areas this may even be reduced by the  
extra info that would become available to both breaking algorithms if  
they interact.
I haven't looked too closely yet at the current implementation of  
multi-column layout, but it seems like a mechanism for getting the  
changed available ipd already exists somehow. How are the line-breaks  
rearranged/recomputed there precisely?


Cheers,

Andreas

Re: Implementation of hyphenation-keep property

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Sep 1, 2006, at 14:45, Jeremias Maerki wrote:

>> <snip />
>> I got the idea while debugging the behavior when processing the
>> disabled testcase 'page-breaking_4.xml'.
>> Notice that the FlowLM's getNextKnuthElements() is currently only
>> called once, which triggers line-breaking for the entire page-
>> sequence given a LayoutContext with ipd equal to that of the first
>> page's region-body. The second page is only prepared after all line-
>> breaks and the first page-break have been computed.
>
> Doesn't sound like total-fit anymore, more like best-fit. Vincent and
> Simon said total-fit for page breaking is very important.

I understand and completely agree, which is why I explicitly looked  
for options that would, by default, come down roughly to the same  
thing we have now, only refined/corrected.

To me it looks like right now we have a total-fit page-breaking,  
combined with a possible non-fit line-breaking. Since line-breaking  
is unaware of available bpd, it cannot take into account any  
overflows in that direction (and implied ipd-changes for the next  
lines).
As you point out, available ipd can only change in case of forced  
breaks/span changes. But even then it seems like we cannot precisely  
determine which page we're on, unless we'd run the PageBreaker over  
the element-list up to that point. That's something I'd like to  
avoid, since this would break the total-fit page-breaking (unless the  
page-breaks are recomputed afterwards...)

If the goal is to achieve a total-fit for both line- and page-breaks,  
and we don't want to waste resources on a whole bunch of unnecessary  
break computations, then it seems like the wisest thing to do first,  
is to try and see if we can move the page-generation in such a way  
that the line-breaking algorithm is always aware of the 'current'  
page, while still no actual page-breaks are computed. The latter can  
still wait until we have collected the full list of line-breaks.

Could get quite tricky, though. Seems like the line-breaking  
algorithm would also need to take into account space-before/-after,  
in order to register correct bp-advancements... bp-advancement is not  
simply line-height, but line-height + resolved space-before +  
resolved space-after. :/


Later,

Andreas

Re: Implementation of hyphenation-keep property

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 01.09.2006 01:21:22 Andreas L Delmelle wrote:
> On Aug 31, 2006, at 20:59, Jeremias Maerki wrote:
> 
> Yeah, it was a lot, wasn't it? :)
> Actually, I was preparing the post myself as yours came in, so I  
> decided to c&p it into a reply, since it did seem to address the same  
> basic issue: interaction between line- and page-breaking.
> 
> > What I can deduct from this is that my suspicion is probably correct
> > that implementating hyphenation-keep will be quite tricky with the
> > current code. I assume we have to do a few changes to make page- und
> > line-breaking interact more closely (for "changing available IPD"  
> > etc.).
> 
> Looking closer at hyphenation-keep: this indeed seems very tricky in  
> the current situation.
> 
> I got the idea while debugging the behavior when processing the  
> disabled testcase 'page-breaking_4.xml'.
> Notice that the FlowLM's getNextKnuthElements() is currently only  
> called once, which triggers line-breaking for the entire page- 
> sequence given a LayoutContext with ipd equal to that of the first  
> page's region-body. The second page is only prepared after all line- 
> breaks and the first page-break have been computed.

Doesn't sound like total-fit anymore, more like best-fit. Vincent and
Simon said total-fit for page breaking is very important.

> Note that this results in optimal line-layout for non-paginated  
> media. It would be perfect for an HTML page (or a page with  
> indefinite height) to compute all line-breaks in one go. Given the  
> current page-breaking algorithm, which performs outstandingly, nobody  
> will notice a thing if all pages use the same page-master. From the  
> moment you add a second one with a slightly narrower/wider region- 
> body, you're in trouble, it seems. :/

Yep, that's one of the current problems.

> Taking into account the possibility of varying ipd due to different  
> page-masters or deferred side-floats... this definitely needs to be  
> changed.
> 
> As a side-note, looking at memory consumption: it would at least  
> offer a chance to perform cleanup if the algorithm jumps from page- 
> layout to line-layout and back.
> Looking at computational complexity: it remains a hypothesis FTM, but  
> I'm guessing that in certain areas this may even be reduced by the  
> extra info that would become available to both breaking algorithms if  
> they interact.
> I haven't looked too closely yet at the current implementation of  
> multi-column layout, but it seems like a mechanism for getting the  
> changed available ipd already exists somehow. How are the line-breaks  
> rearranged/recomputed there precisely?

Available IPD can currently only change if you have a force page break.
Here the page breaking process restarts. Line-breaks are not recomputed
AFAIK. Otherwise, I'd be much happier. The only elements we currently
have for multi-column layout are:
* Column balancing logic
* Span-change logic (page breaking is interrupted on a span-change,
different block sequence)

Otherwise, a column is just like any other page with only one column.
That's why we can't do keep.within-page, yet.

Jeremias Maerki