You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2006/08/22 23:35:12 UTC

Q: Line-layout -> separating element-list creation / line-breaking

Hi all,


I've been doing some thinking myself about the separation between  
element-list creation and actual line-breaking, and went wandering  
through the related sources. Just checking here if my initial  
estimates are correct.

At first glance, a possible separation seems already prepared in some  
way:
LineLM.createLineBreaks() is already a separate method, so could in  
theory be called independently from LineLM.collectInlineKnuthElements 
() --maybe depending on a boolean switch passed into  
LineLM.getNextKnuthElements() (as a first step?)

The only gotcha seems to be the TODO on lines 580-582 in  
LineLayoutManager, where the 'available IPD' is passed into  
collectInlineKnuthElements().
Following the path of this variable, it is afterwards only used to  
create a lineFiller to use for the first/last lines in the paragraph,  
only in case of non-centered alignment. The min/opt/max values of the  
filler are used to calculate the width for the first/last penalties  
and/or glues added in Paragraph.startSequence() / .endSequence().
That's the only influence that 'available IPD' has on the created  
element list.

The only elements invalidated by a later IPD change seem to be those  
start- or end-of-paragraph elements, but they could be either  
replaced or adjusted by the breaker. Provided, of course, that they  
are made distinguishable from the other elements --either by marking  
them, or providing accessors for Paragraph.ignoreAtStart  
and .ignoreAtEnd (?)

If I see it correctly, what we want to go towards is a two-phased  
approach:

1) getBaseKnuthElements()
    (or: collectKnuthElements())
    the lower level LM constructs its base element list completely,
    and passes it up to the ancestor LM for inspection and detection
    of content-width constraints (and other constraints, like forced
    breaks and keeps, widows/orphans)

2) processElementList(from, to, context, alignment)
    (would correspond to the current getNextKnuthElements(), if called
     with 'from' equal to zero, and 'to' equal to list.size())
    the higher level LM issues repeated requests to the lower
    level LM to return element sublists from A to B, but now
    including the line-breaks, when given certain ipd- and alignment
    constraints

and it's step 2) that offers a possibility to restart the breaking  
from one arbitrary element up to another.

The start(from) and end(to) element for the sublists could be  
estimated, based on the content-width, line-height and the context's  
ipd/bpd-constraints:
if minimum-line-height X allows L lines to be stacked in BP direction
=> room for a maximum of (L * available-IPD) in unbroken content-width

If the actual last element that fits in the context turns out not to  
be the last element of the sublist, a pointer should be kept to the  
first element in the remainder of the processed list for which  
baseList.indexOf() does not return -1, in order to obtain the start  
element for the next call.

As such, before making each 'processElementList(from, to, ...)'  
request, the higher level LM gets a chance to update the layout- 
context if there is a change in available IPD --for instance due to a  
(deferred) side-float, or a change in page-master and/or reference- 
orientation.

The breaker, on the other hand, could suspend or interrupt the  
process, for instance upon encountering a start-float, so that the  
next iteration starts with the float's element list (= finish current  
line, return control to the higher level LM, treat the float, if  
necessary update layout-context's ipd, and resume the breaking in the  
updated context)
If the float fits on the page, then this should offer a reasonable  
guarantee that it will end up starting on the line following the one  
that contains its anchor.


Am I making things too simple? Too complicated? :/


Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Patrick Paul <pp...@yahoo.ca>.

Andreas L Delmelle wrote:

> On Aug 23, 2006, at 20:16, Patrick Paul wrote:
>
>> <snip />
>> You are right. Thanks for the suggestion. I would even say that  
>> "determine" is misleading since the min and max content widths are  
>> calculated when getNextKnuthElements is called. So it maybe it  would 
>> be better to name it getMinMaxContentWidth() ?
>
>
> That would be fine, if the method's return type reflects this.

It returns an object that I called MinMax. This object has to accessors: 
int getMin() and int getMax().

>
>> Maybe I shouldn't be calculating the min and max content widths  
>> while element creation is done ?
>
>
> Not sure what you mean here: the elements need to be created to be  
> able to determine how large your minimum-width is, so your min/max  
> calculation needs to be performed in between the element creation and  
> the start of the actual breaking...

Yes I agree. I am actually doing this WHILE the elements are being 
created. Maybe this in an incorect way of doing things ? I will soon be 
sending a little patch using this getMinMaxContentWidth() method. No 
more static variable, hehehe ! So you will see what I mean.

>
> Cheers,
>
> Andreas
>
Thank you,

Patrick

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 23, 2006, at 20:16, Patrick Paul wrote:

> <snip />
> You are right. Thanks for the suggestion. I would even say that  
> "determine" is misleading since the min and max content widths are  
> calculated when getNextKnuthElements is called. So it maybe it  
> would be better to name it getMinMaxContentWidth() ?

That would be fine, if the method's return type reflects this.

> Maybe I shouldn't be calculating the min and max content widths  
> while element creation is done ?

Not sure what you mean here: the elements need to be created to be  
able to determine how large your minimum-width is, so your min/max  
calculation needs to be performed in between the element creation and  
the start of the actual breaking...

Cheers,

Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Patrick Paul <pp...@yahoo.ca>.

Andreas L Delmelle wrote:

> On Aug 23, 2006, at 17:44, Patrick Paul wrote:
>
>> <snip />
>
>
>> Why not keep getNextKnuthElements() wich does the line breaking  
>> depending on a boolean switch ? Then depending on wether the  element 
>> list already exists, the (modified) collectKnuthElements()  method is 
>> called.
>
>
> FTM, such a boolean switch would probably be the wisest move. Not  
> changing too much at once.
>
>> This is my perspective for the auto-table layout, since I am  
>> implementing a new LayoutManager method called  
>> determineMinMaxTextWidths() which returns an object containing the  
>> two values needed to calculate optimum column widths. These values  
>> will go up the hierarchy up to the TableLM where the optimal column  
>> widths will be determined. So no need to return the actual element  
>> lists, right ?
>
>
> Correct. To determine the minimum column-widths, you don't actually  
> need the element lists themselves, only the width of widest piece of  
> unbreakable content in the list.
>
> As a nit, the suggested name determineMinMaxTextWidths() is maybe a  
> bit misleading, since it is not merely text we're dealing with (could  
> also be images... => determineMinMaxContentWidth()?)

You are right. Thanks for the suggestion. I would even say that 
"determine" is misleading since the min and max content widths are 
calculated when getNextKnuthElements is called. So it maybe it would be 
better to name it getMinMaxContentWidth() ? Maybe I shouldn't be 
calculating the min and max content widths while element creation is done ?

>
>
>
> Cheers,
>
> Andreas
>
Patrick

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 23, 2006, at 17:44, Patrick Paul wrote:

> <snip />

> Why not keep getNextKnuthElements() wich does the line breaking  
> depending on a boolean switch ? Then depending on wether the  
> element list already exists, the (modified) collectKnuthElements()  
> method is called.

FTM, such a boolean switch would probably be the wisest move. Not  
changing too much at once.

> This is my perspective for the auto-table layout, since I am  
> implementing a new LayoutManager method called  
> determineMinMaxTextWidths() which returns an object containing the  
> two values needed to calculate optimum column widths. These values  
> will go up the hierarchy up to the TableLM where the optimal column  
> widths will be determined. So no need to return the actual element  
> lists, right ?

Correct. To determine the minimum column-widths, you don't actually  
need the element lists themselves, only the width of widest piece of  
unbreakable content in the list.

As a nit, the suggested name determineMinMaxTextWidths() is maybe a  
bit misleading, since it is not merely text we're dealing with (could  
also be images... => determineMinMaxContentWidth()?)

Cheers,

Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Patrick Paul <pp...@yahoo.ca>.

Andreas L Delmelle wrote:

>
> Hi all,

Hello everyone ;-)

> I've been doing some thinking myself about the separation between  
> element-list creation and actual line-breaking, and went wandering  
> through the related sources. Just checking here if my initial  
> estimates are correct.
>
> At first glance, a possible separation seems already prepared in some  
> way:
> LineLM.createLineBreaks() is already a separate method, so could in  
> theory be called independently from LineLM.collectInlineKnuthElements 
> () --maybe depending on a boolean switch passed into  
> LineLM.getNextKnuthElements() (as a first step?)
>
> The only gotcha seems to be the TODO on lines 580-582 in  
> LineLayoutManager, where the 'available IPD' is passed into  
> collectInlineKnuthElements().
> Following the path of this variable, it is afterwards only used to  
> create a lineFiller to use for the first/last lines in the paragraph,  
> only in case of non-centered alignment. The min/opt/max values of the  
> filler are used to calculate the width for the first/last penalties  
> and/or glues added in Paragraph.startSequence() / .endSequence().
> That's the only influence that 'available IPD' has on the created  
> element list.
>
> The only elements invalidated by a later IPD change seem to be those  
> start- or end-of-paragraph elements, but they could be either  
> replaced or adjusted by the breaker. Provided, of course, that they  
> are made distinguishable from the other elements --either by marking  
> them, or providing accessors for Paragraph.ignoreAtStart  and 
> .ignoreAtEnd (?)
>
> If I see it correctly, what we want to go towards is a two-phased  
> approach:
>
> 1) getBaseKnuthElements()
>    (or: collectKnuthElements())
>    the lower level LM constructs its base element list completely,
>    and passes it up to the ancestor LM for inspection and detection
>    of content-width constraints (and other constraints, like forced
>    breaks and keeps, widows/orphans)
>
> 2) processElementList(from, to, context, alignment)
>    (would correspond to the current getNextKnuthElements(), if called
>     with 'from' equal to zero, and 'to' equal to list.size())
>    the higher level LM issues repeated requests to the lower
>    level LM to return element sublists from A to B, but now
>    including the line-breaks, when given certain ipd- and alignment
>    constraints
>
Why not keep getNextKnuthElements() wich does the line breaking 
depending on a boolean switch ? Then depending on wether the element 
list already exists, the (modified) collectKnuthElements() method is called.

This is my perspective for the auto-table layout, since I am 
implementing a new LayoutManager method called 
determineMinMaxTextWidths() which returns an object containing the two 
values needed to calculate optimum column widths. These values will go 
up the hierarchy up to the TableLM where the optimal column widths will 
be determined. So no need to return the actual element lists, right ?

> and it's step 2) that offers a possibility to restart the breaking  
> from one arbitrary element up to another.
>
> The start(from) and end(to) element for the sublists could be  
> estimated, based on the content-width, line-height and the context's  
> ipd/bpd-constraints:
> if minimum-line-height X allows L lines to be stacked in BP direction
> => room for a maximum of (L * available-IPD) in unbroken content-width
>
> If the actual last element that fits in the context turns out not to  
> be the last element of the sublist, a pointer should be kept to the  
> first element in the remainder of the processed list for which  
> baseList.indexOf() does not return -1, in order to obtain the start  
> element for the next call.
>
> As such, before making each 'processElementList(from, to, ...)'  
> request, the higher level LM gets a chance to update the layout- 
> context if there is a change in available IPD --for instance due to a  
> (deferred) side-float, or a change in page-master and/or reference- 
> orientation.
>
> The breaker, on the other hand, could suspend or interrupt the  
> process, for instance upon encountering a start-float, so that the  
> next iteration starts with the float's element list (= finish current  
> line, return control to the higher level LM, treat the float, if  
> necessary update layout-context's ipd, and resume the breaking in the  
> updated context)
> If the float fits on the page, then this should offer a reasonable  
> guarantee that it will end up starting on the line following the one  
> that contains its anchor.
>
>
> Am I making things too simple? Too complicated? :/
>
>
> Andreas
>
Patrick

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Simon Pepping <sp...@leverkruid.eu>.

On Wed, Aug 23, 2006 at 09:59:14PM +0200, Andreas L Delmelle wrote:
> On Aug 23, 2006, at 21:16, Patrick Paul wrote:
> 
> Simon's correct. I'll clarify a bit more. From what I could tell,  
> Patrick's patch has added code at the right places, it only seems a  
> bit awkward to me to import stuff related to table-layout into the  
> LineLayoutManager (even if it is nothing more than a TableHelper).

I had not realized you were talking about tables. I thought it was
connected with the need to find a layout solution for page sequences
with different page widths.
 
> (Note: this part was not wrong per se in the patch. It just seems  
> there currently is no other way, precisely because list-creation and  
> breaking are performed in the same method. The only place where it  
> can be entered is somewhere between those two statements -- 
> collectInlineKnuthElements() and createLineBreaks().)
> 
> A more correct approach --a matter of taste?-- would be for the  
> TableContentLM to 'collect' the accumulated list of its descendant  
> LMs, perform the min/max-width calculation, update the LayoutContext,  
> and send it back down to the LineBreaker.
> 
> Maybe someone sees another approach that I'm overlooking?

The LineLM is the master of the inline content, so to say, so it
governs the line breaking process. Therefore it may be the right
object to be requested to calculate the width of the content. But
maybe it would be better to add a getTotalWidth method to the
Paragraph.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 25, 2006, at 03:07, Andreas L Delmelle wrote:

> <snip />
> Rough sketch:
> Page-breaking initializes first, and prefetches say five blank pages.
> From these it constructs one long context, call it one big page -- 
> or better: one region-body--, with ipd changes at a known set of  
> coordinates.
>
> This context is then passed to the FlowLM, and further down. If the  
> LineLM is aware of its bp-coordinate, the LineBreaker will know  
> about changes in ipd, but it does not need to know that it is a  
> page-break, which it isn't at this point.

Thinking this through again:
If this approach is feasible, then this would mean that line-breaking  
does not need to be made restartable to deal with ipd-changes between  
pages. Only suspendible, which it already is.

Exception for the last page: while the ipd change for the last page  
could always be included in the context, the line-breaking still  
needs to be made aware at which coordinate this change will occur. It  
will be one of the known coordinates, but it cannot be said precisely  
which one.
Even if the interaction between line- and page-breaking can be kept  
to a minimum, I think the line-breaker still would have to check  
somehow at each of the known bp-coordinates how much content is  
remaining, in order to predict the last ipd change.

Later,

Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 24, 2006, at 21:01, Simon Pepping wrote:

> On Thu, Aug 24, 2006 at 12:40:25PM +0200, Andreas L Delmelle wrote:
>> This could actually be done in a single thread, chopping the process
>> up into discrete parts, and bouncing control from (a) to (b) to (c)
>> and back.
>> Implementing this with multiple threads would be much trickier, since
>> the access to all the lists needs to be synchronized.
>
> An interesting side effect of the total-fit solution to page breaking
> is that no interaction is required between line and page breaking. The
> two are independent and one is performed before the other. That
> changes when the line layout depends on the page layout, e.g. due to
> different page widths or due to side floats. This makes the
> programming more complex.

Indeed it would, but not necessarily increasing the computational  
complexity...

Rough sketch:
Page-breaking initializes first, and prefetches say five blank pages.
 From these it constructs one long context, call it one big page --or  
better: one region-body--, with ipd changes at a known set of  
coordinates.

This context is then passed to the FlowLM, and further down. If the  
LineLM is aware of its bp-coordinate, the LineBreaker will know about  
changes in ipd, but it does not need to know that it is a page-break,  
which it isn't at this point.

Coming back up means either overflow of the context, or no more content.
In the latter case control is handed over to the PageBreaker.
In the first case, for a total-fit solution the set of active nodes  
is kept active, the next five pages are fetched, and the process  
repeats, until the content runs out, at which point the PageBreaker  
kicks in. A more memory-friendly algorithm could decide to run the  
PageBreaker sooner, and continue the process with the last best node  
as a starting-point.

Cheers,

Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Simon Pepping <sp...@leverkruid.eu>.

On Thu, Aug 24, 2006 at 12:40:25PM +0200, Andreas L Delmelle wrote:
> On Aug 23, 2006, at 21:59, Andreas L Delmelle wrote:
> 
> ><snip />
> >The other consideration, but that would be for a more distant  
> >future, is to be able to have three different threads:
> >- fo creation (a)
> >- base layoutengine initialization (b)
> >- actual breaking/layout (c)
> 
> A bit more elaboration:
> This could actually be done in a single thread, chopping the process  
> up into discrete parts, and bouncing control from (a) to (b) to (c)  
> and back.
> Implementing this with multiple threads would be much trickier, since  
> the access to all the lists needs to be synchronized.

An interesting side effect of the total-fit solution to page breaking
is that no interaction is required between line and page breaking. The
two are independent and one is performed before the other. That
changes when the line layout depends on the page layout, e.g. due to
different page widths or due to side floats. This makes the
programming more complex.

The same problem would be introduced by your above process. Bouncing
control between line and page layout introduces an interaction between
the two.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 23, 2006, at 21:59, Andreas L Delmelle wrote:

> <snip />
> The other consideration, but that would be for a more distant  
> future, is to be able to have three different threads:
> - fo creation (a)
> - base layoutengine initialization (b)
> - actual breaking/layout (c)

A bit more elaboration:
This could actually be done in a single thread, chopping the process  
up into discrete parts, and bouncing control from (a) to (b) to (c)  
and back.
Implementing this with multiple threads would be much trickier, since  
the access to all the lists needs to be synchronized.

The main difference between total-fit and first-fit would be as to  
how far all the lists ultimately grow. Even then, suppose that we  
have computed the possible breaks for the first ten pages, what is  
the probability that even a total-fit algorithm would need to revisit  
the scores?

The direction I'm thinking in, lies somewhere between first- and  
total-fit. For relatively short sequences --say 20 pages max.-- FOP's  
default behavior would offer the same result as a real total-fit.  
What we also have to keep in mind, though, is the possibility of  
arbitrarily large sequences. An unconditional total-fit might become  
next to impossible to deal with for systems with average memory specs.
I'm thinking this could be controlled by a configuration option: the  
maximum number of pages that is considered before the finishing part  
of the layout phase is triggered.
A threshold of zero would mean that the areas are added immediately  
upon reaching the first break --literal first-fit-- and there is  
neither a need nor a possibility to revisit the breaks on earlier  
pages. Setting this threshold to say 500, would make sure that one  
always achieves the best possible layout for 500 subsequent pages,  
provided that memory constraints allow it.

Just so as not to confuse a total-fit algorithm with performing  
everything in one big loop. While the needed context is much larger,  
it still can be gathered piece-by-piece. Only, certain decisions need  
to be deferred until the complete/total context is known.

The important part is that the opportunity would be opened for any  
type of algorithm to release parts of the FOTree and the layout tree  
earlier, by constraining the window to a maximum number of pages.

The question is whether the base element list creation could be  
considered a part of the layout-initialization process, or whether it  
belongs in the actual layout phase. Auto-layout for tables is one  
particular area where it may prove interesting to consider the base  
element list creation as a preparatory step rather than as a part of  
the layout process itself.

Maybe there are others?

Later,

Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 24, 2006, at 02:39, Patrick Paul wrote:

> Andreas L Delmelle wrote:
>> <snip />
>> A more correct approach --a matter of taste?-- would be for the   
>> TableContentLM to 'collect' the accumulated list of its  
>> descendant  LMs, perform the min/max-width calculation, update the  
>> LayoutContext,  and send it back down to the LineBreaker.
>
> When you say 'collect' I suppose you mean that the TableContentLM  
> whould have to keep all the LinkedLists of elements for the entire  
> table.

Not necessarily the TableContentLM itself, but it could be made such  
that the higher-level LM at least gets to see the raw element list  
before it is handed over to the breaker. I was thinking this could  
offer opportunities in other areas as well, although I now have a  
hard time coming up with any concrete examples.
Maybe this could help avoid certain expensive operations during the  
breaking stage, if it can be derived from seeing the base element  
list that they will be useless/futile anyway (?)

> I haven't looked at this closely, but wouldn't that also imply some  
> kind of mechanism on the TableCellLM and BlockLMs when going back  
> down to the LineBreaker to perform the line-breaking?

Indeed, they would have to be involved in this as well.

>
> If I understand all this correctly, it would imply holding the  
> element lists for an entire table. Wouldn't that be expensive in  
> terms of memory?

Isn't that what we currently already do? For an entire page-sequence  
no less... It is indeed expensive, but necessary for a total-fit  
algorithm. No way around that, I'm afraid.


> <snip />
>
>> Strictly speaking, apart from the  arbitrary default 'available  
>> ipd / # columns', you don't have a  correct ipd to give to the  
>> Breaker. If you do perform the breaking,  however, the element  
>> list will also contain elements that have been  added by the Breaker.
>
> A way around this is to give an arbitrarly large default available  
> ipd the first time you run through the algorithm. This reduces the  
> extra line-breaking.

Good point!

>> Although I haven't done too much investigation on whether they   
>> influence the content-width calculation, this still means these   
>> elements are created for no reason at all.
>> IOW: this would be a wasted trip through the breaking algorithm.
>
> Correct. This is what is happening for now. I run through the  
> getNextKnuthElements twice: the first time to determine the minimum  
> and maximum widths of the content, and the second to do the proper  
> line-breaking.


Cheers,


Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Patrick Paul <pp...@yahoo.ca>.

Andreas L Delmelle wrote:

> On Aug 23, 2006, at 21:16, Patrick Paul wrote:
>
>> Simon Pepping wrote:
>>
>>> On Tue, Aug 22, 2006 at 11:35:12PM +0200, Andreas L Delmelle wrote:
>>
>
>>>> I've been doing some thinking myself about the separation  between  
>>>> element-list creation and actual line-breaking, and went  
>>>> wandering  through the related sources. Just checking here if my  
>>>> initial  estimates are correct.
>>>>
>>>
>>> Which problem are you trying to solve? Paragraph layout where the  line
>>> width changes in the paragraph?
>>>
>> I don't know exactly what Adreas has in mind be we actually need  
>> this for the auto-table layout.
>
>
> Simon's correct. I'll clarify a bit more. From what I could tell,  
> Patrick's patch has added code at the right places, it only seems a  
> bit awkward to me to import stuff related to table-layout into the  
> LineLayoutManager (even if it is nothing more than a TableHelper).

I have since improved the patch to use a getMinMaxContentWidths() method 
that passes the values up the LMs. I will submit it soon. So the only 
thing remaning in the way for now is the fact that the element lists are 
created twice (since the line-breaking can be disabled using a switch).

>
> (Note: this part was not wrong per se in the patch. It just seems  
> there currently is no other way, precisely because list-creation and  
> breaking are performed in the same method. The only place where it  
> can be entered is somewhere between those two statements -- 
> collectInlineKnuthElements() and createLineBreaks().)
>
> A more correct approach --a matter of taste?-- would be for the  
> TableContentLM to 'collect' the accumulated list of its descendant  
> LMs, perform the min/max-width calculation, update the LayoutContext,  
> and send it back down to the LineBreaker.

When you say 'collect' I suppose you mean that the TableContentLM whould 
have to keep all the LinkedLists of elements for the entire table. I 
haven't looked at this closely, but wouldn't that also imply some kind 
of mechanism on the TableCellLM and BlockLMs when going back down to the 
LineBreaker to perform the line-breaking?

If I understand all this correctly, it would imply holding the element 
lists for an entire table. Wouldn't that be expensive in terms of memory?

>
> Maybe someone sees another approach that I'm overlooking?
>
> The breaking algorithm needs to be given a LayoutContext with a  
> certain available amount of ipd. Since the base element list does not  
> depend on this amount --apart from the start- or end-of-paragraph  
> elements-- this means it should be feasible to separate the former  
> from the latter.

This is how I see it too.

>
> In auto-table layout, we have to take into account the possibility  
> that none of the column-widths are specified. Both minimum and  
> maximum depend on the content.

I think that this is not only a possibility but a common situation. I 
find that I often want to let the formatter determine the optimal column 
(and table) widths depending on the content.

> Strictly speaking, apart from the  arbitrary default 'available ipd / 
> # columns', you don't have a  correct ipd to give to the Breaker. If 
> you do perform the breaking,  however, the element list will also 
> contain elements that have been  added by the Breaker.

A way around this is to give an arbitrarly large default available ipd 
the first time you run through the algorithm. This reduces the extra 
line-breaking.

> Although I haven't done too much investigation on whether they  
> influence the content-width calculation, this still means these  
> elements are created for no reason at all.
> IOW: this would be a wasted trip through the breaking algorithm.

Correct. This is what is happening for now. I run through the 
getNextKnuthElements twice: the first time to determine the minimum and 
maximum widths of the content, and the second to do the proper 
line-breaking.

> <snip />
>
>> This may be possible, but it does not seem to be in the spirit of the
>> algorithm. The algorithm calculates the best layout up to the current
>> linebreak. You can always interrupt the calculation and resume later,
>> provided you keep the list of active nodes. If you want N lines, then
>> you do not determine in advance the end element. You proceed until you
>> have all active nodes that end line N, and then determine the best of
>> them. But it is even better to continue to the end of the paragraph,
>> taking the modified line width into account. That gives you the best
>> total-fit for the whole paragraph. In that total-fit, the layout for
>> the first N lines may be suboptimal, but then it is offset by a better
>> result for the remaining lines.
>
>
> OK. This makes sense. I'm going to chew some more on this.
>
> The other consideration, but that would be for a more distant future,  
> is to be able to have three different threads:
> - fo creation
> - base layoutengine initialization
> - actual breaking/layout
>
> Those could run in concert, so the layoutengine initialization does  
> not need to wait for the endPageSequence() event, but could already  
> be started much earlier. If it reaches a point where it makes sense  
> to start with the break computation, the third thread can be started.  
> Ultimately, this will end up running out of input, and control will  
> be returned to the second thread. If the second thread runs out of  
> FOs, it returns control to the first.
>
> In theory, it then also becomes possible, for both a total-fit and  
> first-fit algorithm, to release certain FOs before the layout for the  
> entire page-sequence is finished.
>
>
> Cheers,
>
> Andreas

Patrick

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Aug 23, 2006, at 21:16, Patrick Paul wrote:

> Simon Pepping wrote:
>
>> On Tue, Aug 22, 2006 at 11:35:12PM +0200, Andreas L Delmelle wrote:

>>> I've been doing some thinking myself about the separation  
>>> between  element-list creation and actual line-breaking, and went  
>>> wandering  through the related sources. Just checking here if my  
>>> initial  estimates are correct.
>>>
>>
>> Which problem are you trying to solve? Paragraph layout where the  
>> line
>> width changes in the paragraph?
>>
> I don't know exactly what Adreas has in mind be we actually need  
> this for the auto-table layout.

Simon's correct. I'll clarify a bit more. From what I could tell,  
Patrick's patch has added code at the right places, it only seems a  
bit awkward to me to import stuff related to table-layout into the  
LineLayoutManager (even if it is nothing more than a TableHelper).

(Note: this part was not wrong per se in the patch. It just seems  
there currently is no other way, precisely because list-creation and  
breaking are performed in the same method. The only place where it  
can be entered is somewhere between those two statements -- 
collectInlineKnuthElements() and createLineBreaks().)

A more correct approach --a matter of taste?-- would be for the  
TableContentLM to 'collect' the accumulated list of its descendant  
LMs, perform the min/max-width calculation, update the LayoutContext,  
and send it back down to the LineBreaker.

Maybe someone sees another approach that I'm overlooking?

The breaking algorithm needs to be given a LayoutContext with a  
certain available amount of ipd. Since the base element list does not  
depend on this amount --apart from the start- or end-of-paragraph  
elements-- this means it should be feasible to separate the former  
from the latter.

In auto-table layout, we have to take into account the possibility  
that none of the column-widths are specified. Both minimum and  
maximum depend on the content. Strictly speaking, apart from the  
arbitrary default 'available ipd / # columns', you don't have a  
correct ipd to give to the Breaker. If you do perform the breaking,  
however, the element list will also contain elements that have been  
added by the Breaker.
Although I haven't done too much investigation on whether they  
influence the content-width calculation, this still means these  
elements are created for no reason at all.
IOW: this would be a wasted trip through the breaking algorithm.

<snip />
> This may be possible, but it does not seem to be in the spirit of the
> algorithm. The algorithm calculates the best layout up to the current
> linebreak. You can always interrupt the calculation and resume later,
> provided you keep the list of active nodes. If you want N lines, then
> you do not determine in advance the end element. You proceed until you
> have all active nodes that end line N, and then determine the best of
> them. But it is even better to continue to the end of the paragraph,
> taking the modified line width into account. That gives you the best
> total-fit for the whole paragraph. In that total-fit, the layout for
> the first N lines may be suboptimal, but then it is offset by a better
> result for the remaining lines.

OK. This makes sense. I'm going to chew some more on this.

The other consideration, but that would be for a more distant future,  
is to be able to have three different threads:
- fo creation
- base layoutengine initialization
- actual breaking/layout

Those could run in concert, so the layoutengine initialization does  
not need to wait for the endPageSequence() event, but could already  
be started much earlier. If it reaches a point where it makes sense  
to start with the break computation, the third thread can be started.  
Ultimately, this will end up running out of input, and control will  
be returned to the second thread. If the second thread runs out of  
FOs, it returns control to the first.

In theory, it then also becomes possible, for both a total-fit and  
first-fit algorithm, to release certain FOs before the layout for the  
entire page-sequence is finished.

Cheers,

Andreas

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Patrick Paul <pp...@yahoo.ca>.

Simon Pepping wrote:

>On Tue, Aug 22, 2006 at 11:35:12PM +0200, Andreas L Delmelle wrote:
>  
>
>>Hi all,
>>
>>
>>I've been doing some thinking myself about the separation between  
>>element-list creation and actual line-breaking, and went wandering  
>>through the related sources. Just checking here if my initial  
>>estimates are correct.
>>    
>>
>
>Which problem are you trying to solve? Paragraph layout where the line
>width changes in the paragraph?
>  
>
I don't know exactly what Adreas has in mind be we actually need this 
for the auto-table layout.

Regards,

Patrick

Re: Q: Line-layout -> separating element-list creation / line-breaking

Posted by Simon Pepping <sp...@leverkruid.eu>.

On Tue, Aug 22, 2006 at 11:35:12PM +0200, Andreas L Delmelle wrote:
> 
> Hi all,
> 
> 
> I've been doing some thinking myself about the separation between  
> element-list creation and actual line-breaking, and went wandering  
> through the related sources. Just checking here if my initial  
> estimates are correct.

Which problem are you trying to solve? Paragraph layout where the line
width changes in the paragraph?

> At first glance, a possible separation seems already prepared in some  
> way:
> LineLM.createLineBreaks() is already a separate method, so could in  
> theory be called independently from LineLM.collectInlineKnuthElements 
> () --maybe depending on a boolean switch passed into  
> LineLM.getNextKnuthElements() (as a first step?)

That separation is certainly there. It is a basis of the Knuth
algorithm. The Knuth elements are an abstract representation of the
text, containing precisely the information that the linebreaking
algorithm requires. That representation must be known when the
algorithm starts, and forms its input.

> The only gotcha seems to be the TODO on lines 580-582 in  
> LineLayoutManager, where the 'available IPD' is passed into  
> collectInlineKnuthElements().

Indeed, this invalidates the independence of the two phases, and
cannot be fundamental.

... snip...

> The start(from) and end(to) element for the sublists could be  
> estimated, based on the content-width, line-height and the context's  
> ipd/bpd-constraints:
> if minimum-line-height X allows L lines to be stacked in BP direction
> => room for a maximum of (L * available-IPD) in unbroken content-width
> 
> If the actual last element that fits in the context turns out not to  
> be the last element of the sublist, a pointer should be kept to the  
> first element in the remainder of the processed list for which  
> baseList.indexOf() does not return -1, in order to obtain the start  
> element for the next call.
> 
> As such, before making each 'processElementList(from, to, ...)'  
> request, the higher level LM gets a chance to update the layout- 
> context if there is a change in available IPD --for instance due to a  
> (deferred) side-float, or a change in page-master and/or reference- 
> orientation.
> 
> The breaker, on the other hand, could suspend or interrupt the  
> process, for instance upon encountering a start-float, so that the  
> next iteration starts with the float's element list (= finish current  
> line, return control to the higher level LM, treat the float, if  
> necessary update layout-context's ipd, and resume the breaking in the  
> updated context)
> If the float fits on the page, then this should offer a reasonable  
> guarantee that it will end up starting on the line following the one  
> that contains its anchor.

This may be possible, but it does not seem to be in the spirit of the
algorithm. The algorithm calculates the best layout up to the current
linebreak. You can always interrupt the calculation and resume later,
provided you keep the list of active nodes. If you want N lines, then
you do not determine in advance the end element. You proceed until you
have all active nodes that end line N, and then determine the best of
them. But it is even better to continue to the end of the paragraph,
taking the modified line width into account. That gives you the best
total-fit for the whole paragraph. In that total-fit, the layout for
the first N lines may be suboptimal, but then it is offset by a better
result for the remaining lines.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu