You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Vincent Hennebert <vh...@gmail.com> on 2006/09/12 20:26:49 UTC

Re: [Xmlgraphics-fop Wiki] Update of "PageLayout/PageAndLineBreaking" by SimonPepping

Hi Simon,

I finally took the time to read and digest your Wiki page. This is an
interesting reading. A few comments:


> According to that representation paragraphs with inline text have
> legal linebreak points. I consider those legal linebreak points also
> as legal pagebreak points. In addition, there are legal pagebreak
> points between the vertical elements such as paragraphs and blocks.

One issue that will have to be addressed is that widow or
keep-with-previous settings may invalidate some previously believed
legal breakpoints. In such cases, active nodes which contain those
breakpoints in their chains will have to be deactivated; if they were
the chosen best nodes, some other nodes will somehow have to be
retrieved to replace them. I hope this won't be a too great difficulty.


> Within the inner loop, we consider the page and paragraph layout
> between the active and the current pagebreak point. If the active
> breakpoint is within a paragraph, we calculate the best line breaks
> from that breakpoint to the end of the paragraph. For all complete

Unless the current breakpoint lies in the same paragraph.


> In page independent linebreaking, for each feasible breakpoint the
> best node is retained, which represents the best layout of the
> paragraph up to that point, and which, due to the dynamic principle,
> is part of the best layout of the whole paragraph if that layout uses
> this breakpoint. If line numbers matter, the best node for each line
> number is retained. In page dependent linebreaking, even that is not
> enough. We must retain the best node for each vertical offset on the
> page, because that is the quantity that influences page breaking. This

Good point. This led me to the following thoughts:

Currently the iteration over the active nodes is broken into two loops:
one loop for iterating over the line numbers, one for iterating over the
active nodes associated to each line number. Why? Because if line widths
aren't the same they have an influence on the computation of best line
breaks. Because when considering a given legal breakpoint, we must know
the width of the line it would end in order to be able to compute the
shrinking/stretching for that line. In fact we make a distinction
between line numbers because they determine the /context/ in which
linebreaks are computed. If all the lines had the same widths such a
distinction wouldn't be necessary.

The merging of line- and page-breakings generalizes this problem of
differing contexts. This time, not only the line number counts, but also
the page number, the offset from the beginning of the page, the
out-of-lines to be placed, etc. I think the greatest challenge will be
to identify all the elements which determine that context, and to be
able to compare two contexts and say if they are equivalent or not.
Considering the case 1 you describe on the wiki page, there are only two
different contexts: page number even or odd. In this case the offset
from the beginning of the page doesn't count. In other more complex
documents this may be much more complicated.


> When the linewidths depend on the page number, we need to remember the
> best pagebreak node for each feasible pagebreak point for each page
> number. Otherwise, we only need to remember the best pagebreak node
> for each feasible pagebreak point. Note that the latter condition is
> true in the presence of out-of-line elements, because those are
> related to the content of the page, not the the page number.

Small typo: "not the the page number"


> Optimization opportunity 1: We may need to reuse many times the best
> layout from a breakpoint to the end of the paragraph, and the best
> layout from the start of the paragraph to a breakpoint. Therefore we
> need to store a reference to the best end node for either case in the
> active node. If we wish to take into account the different possible
> heights of the part of the paragraph, we need to store references to a
> set of best end nodes. Especially for long page sequences, a page
> breakpoint may be feasible both for an odd and an even page. In that
> case, we need to store different end points for each, due to the
> different line widths on odd and even pages. This optimization is
> certainly true for the start of the paragraph. There will be many page
> layouts on which the whole paragraph fits.

So, this might be handled automatically by the dynamic algorithm if we
were able to identify the different contexts.


> Optimization opportunity 2: Do we need to consider each active node?
> Or can we already determine that some active nodes will never give a
> better layout than others? Suppose that a paragraph has two feasible
> breakpoints A and B, which have an equal number of lines, or even the
> same height, before and after the page breakpoint. Suppose that B has
> a higher amount of demerits than A. Can we then conclude that B will
> never be part of the best layout, because a better layout can always
> be achieved with A? Yes, we can.

Same here, we would just have to detect that linebreak point A and B
always occur in the same context, and then that B may be discarded.


> In most cases the whole paragraph will fit on the page, and the legal
> pagebreak point after it is not feasible. So we do another paragraph,
> and another. Finally we will reach a feasible pagebreak point, which
> ends page 1. The corresponding linebreak point ends a line which is
> able to fill the page height with maximally tolerable stretching. In
> the next steps we will find more feasible pagebreak points. They
> correspond to different layouts of the last paragraph that starts on
> this page, with the same number of lines, and to layouts with more
> lines, which also fit on the page, until the maximal shrink is
> exceeded. At this point the starting node is deactivated, and the
> active page node list only contains nodes that end page 2.

Unless I've missed something: that end page *1*. Or?


> It should be noted that for each feasible pagebreak node, we restart
> the linebreaking calculations of that page. Of course we must optimize
> this. We could attach a data structure to the starting page node which
> records the results of the calculations done. Those results consist of
> the active line node list, the last linebreak point considered, and
> the widths calculated up to that point. During uninterrupted linebreak
> calculation these data are held in loop variables. Now we must store
> them so that they can be retrieved when we resume the calculation in
> the consideration of the next current pagebreak point. On the first

If the main loop were iterating over line breaks, we wouldn't have to do
this. But perhaps we would have to for pagebreaks then? Which would be
equivalent, but I can't figure out yet.


Regards,
Vincent

Re: [Xmlgraphics-fop Wiki] Update of "PageLayout/PageAndLineBreaking" by SimonPepping

Posted by Simon Pepping <sp...@leverkruid.eu>.
Vincent,

Thanks for your reaction.

On Tue, Sep 12, 2006 at 08:26:49PM +0200, Vincent Hennebert wrote:
> >According to that representation paragraphs with inline text have
> >legal linebreak points. I consider those legal linebreak points also
> >as legal pagebreak points. In addition, there are legal pagebreak
> >points between the vertical elements such as paragraphs and blocks.
> 
> One issue that will have to be addressed is that widow or
> keep-with-previous settings may invalidate some previously believed
> legal breakpoints. In such cases, active nodes which contain those
> breakpoints in their chains will have to be deactivated; if they were
> the chosen best nodes, some other nodes will somehow have to be
> retrieved to replace them. I hope this won't be a too great difficulty.

Indeed, you mentioned it before. It certainly is a complicating factor.
 
> >In page independent linebreaking, for each feasible breakpoint the
> >best node is retained, which represents the best layout of the
> >paragraph up to that point, and which, due to the dynamic principle,
> >is part of the best layout of the whole paragraph if that layout uses
> >this breakpoint. If line numbers matter, the best node for each line
> >number is retained. In page dependent linebreaking, even that is not
> >enough. We must retain the best node for each vertical offset on the
> >page, because that is the quantity that influences page breaking. This
> 
> Good point. This led me to the following thoughts:
> 
> Currently the iteration over the active nodes is broken into two loops:
> one loop for iterating over the line numbers, one for iterating over the
> active nodes associated to each line number. Why? Because if line widths
> aren't the same they have an influence on the computation of best line
> breaks. Because when considering a given legal breakpoint, we must know
> the width of the line it would end in order to be able to compute the
> shrinking/stretching for that line. In fact we make a distinction
> between line numbers because they determine the /context/ in which
> linebreaks are computed. If all the lines had the same widths such a
> distinction wouldn't be necessary.
> 
> The merging of line- and page-breakings generalizes this problem of
> differing contexts. This time, not only the line number counts, but also
> the page number, the offset from the beginning of the page, the
> out-of-lines to be placed, etc. I think the greatest challenge will be
> to identify all the elements which determine that context, and to be
> able to compare two contexts and say if they are equivalent or not.
> Considering the case 1 you describe on the wiki page, there are only two
> different contexts: page number even or odd. In this case the offset
> from the beginning of the page doesn't count. In other more complex
> documents this may be much more complicated.

Yeah, difficult point. I do not think I understand this aspect well
enough.

> >In most cases the whole paragraph will fit on the page, and the legal
> >pagebreak point after it is not feasible. So we do another paragraph,
> >and another. Finally we will reach a feasible pagebreak point, which
> >ends page 1. The corresponding linebreak point ends a line which is
> >able to fill the page height with maximally tolerable stretching. In
> >the next steps we will find more feasible pagebreak points. They
> >correspond to different layouts of the last paragraph that starts on
> >this page, with the same number of lines, and to layouts with more
> >lines, which also fit on the page, until the maximal shrink is
> >exceeded. At this point the starting node is deactivated, and the
> >active page node list only contains nodes that end page 2.
> 
> Unless I've missed something: that end page *1*. Or?

Yes. Thanks.

> >It should be noted that for each feasible pagebreak node, we restart
> >the linebreaking calculations of that page. Of course we must optimize
> >this. We could attach a data structure to the starting page node which
> >records the results of the calculations done. Those results consist of
> >the active line node list, the last linebreak point considered, and
> >the widths calculated up to that point. During uninterrupted linebreak
> >calculation these data are held in loop variables. Now we must store
> >them so that they can be retrieved when we resume the calculation in
> >the consideration of the next current pagebreak point. On the first
> 
> If the main loop were iterating over line breaks, we wouldn't have to do
> this. But perhaps we would have to for pagebreaks then? Which would be
> equivalent, but I can't figure out yet.

I have not thought about the problem in this way. It seems unnatural
to me and also rather equivalent. 

I will update the wiki page as soon as I can reach it again.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu