You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-commits@xmlgraphics.apache.org by Apache Wiki <wi...@apache.org> on 2005/09/30 11:11:24 UTC

[Xmlgraphics-fop Wiki] Update of "KnuthsModel" by LucaFurini

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" for change notification.

The following page has been changed by LucaFurini:
http://wiki.apache.org/xmlgraphics-fop/KnuthsModel

The comment on the change is:
Initial description of Knuth's model

New page:
FOP's line breaking and page breaking algorithms both implement Knuth's breaking algorithm.

The detailed description of the model and the algorithm can be found on the paper "Breaking Paragraphs into Lines" by Donald E. Knuth, published in the book "Digital Typography" (Stanford, California: Center for the Study of Language and Information, 1999), (CSLI Lecture Notes, no. 78.), ISBN 1-57586-010-4 (which is surely a book worth reading).

This is just a short summary meant to give a quick overview of the content representation that is the input of the breaking algorithm, such allowing interested people to understand FOP's behaviour even without the need to know how exactly the breaks are actually computed.

The best way to know how the algorithm examines this representation and find the breaks is reading Knuth's paper cited above and look at its implementation inside FOP, which follows quite closely the description.

It is important to note that this representation is not bound to a particular algorithm: it's very easy to use it as a input for a completely different breaking algorithm, such as a first-fit or best-fit one.

Knuth's original algorithm solves the line breaking problem, so this page will speak in terms of "paragraphs", "lines" and "words", but most of the concepts apply to the page breaking problem too: it's enough to substitute "paragraphs" with "page sequences", "lines" with "pages", and "words" with "lines". Just as the line breaking algorithm breaks the content of paragraphs into lines (containing words), the page breaking algorithm breaks the content of page-sequences into pages (containing lines).

= Formal representation of the problem =

A paragraph can be modelled as a sequence of elements

x,,1,, x,,2,, x,,3,, ... x,,n,,

where each x,,i,, can be either a '''box''' element, a '''glue''' element or a '''penalty''' element.

All elements have a '''width''' w; if x,,i,, is the i-th element in the sequence, its width will be w,,i,,.

== Box elements ==

A box element represents an unbreakable piece of content with fixed width: for example a character, a syllable (but only if letter spacing is constant), an inline image, ...

In the context of page breaking, a box element can represent a line.

== Glue elements ==

A glue element represents some kind of space between boxes: for example, a space character between two words.

A glue element x,,i,,, besides its optimal width w,,i,, has a '''stretchability''' y,,i,, and a '''shrinkability''' z,,i,,; this means that the breaking algorithm can set the element's actual width to any value in the range [w,,i,, - z,,i,,, w,,i,, + y,,i,,], although it will try to choose a value as close as possible to w,,i,,.

In the context of page breaking, a glue element can represent the space between two paragraphs.

== Penalty elements ==

A penalty element represents information about a breaking point; it does not represent any piece of content.

The width w,,i,, of a penalty element x,,i,, is considered only if the breaking algorithm chooses that penalty element as a break. For example, in the context of line breaking the width of the hyphen character "-" must be considered only if a word is actually hyphenated.

A penalty element x,,i,, has a '''penalty value''' p,,i,, representing the "aesthetic cost" of the break: a positive cost suggests the breaking algorithm not to use that break point, while a negative cost signals an appealing position for a break. In particular, if p,,i,, is +infinity it forbids a break, while if it is -infinity it forces a break (in other words, there cannot be a better place for a break).

Moreover, a penalty element has a '''flagged''' boolean value: this value marks penalties that should not be chosen to end consecutive lines. For example, in the context of line breaking the hyphenation points inside a word are represented using flagged penalties, so the algorithm tries to avoid the creation of consecutive lines ending with a hyphen.

== Starting and ending elements ==

... coming soon ....

== Feasible breaks ==

The goal of the breaking algorithm is to find an ordered set of indices

b,,0,, < b,,1,, < b,,2,, < ... < b,,k,,

such that
 * the first index, b,,0,,, is conventionally 0 (note that the first element in the sequence has index = 1)
 * the other indices point to an element that is a 'feasible break'

There are two simple rules stating whether an element is a feasible break or not: x,,i,, is a feasible break if either:

 1. x,,i,, is a glue element, and x,,i-1,, is a box; or
 1. x,,i,, is a penalty element and p,,i,, is not +infinity

The algorithm must respect the forced breaks: this means that this set must contain the indices of all the penalties whose value is -infinity; in particular b,,k,, = n.

== Element suppression ==

Having decided where the lines end, it's time to decide where they begin.

... coming soon ...

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-commits-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-commits-help@xmlgraphics.apache.org