You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-commits@xmlgraphics.apache.org by Apache Wiki <wi...@apache.org> on 2005/11/08 15:59:31 UTC

[Xmlgraphics-fop Wiki] Update of "LineBreaking" by ManuelMall

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" for change notification.

The following page has been changed by ManuelMall:
http://wiki.apache.org/xmlgraphics-fop/LineBreaking

The comment on the change is:
Documents the Knuth sequences for line breaking and justification used in FOP

New page:
This page looks at issues around the generation of Knuth element for line break possibilities. It does not deal with actually determining line break possibilities but concentrates only on the Knuth elements to be generated for a particular line break possibility. Because it is related it also deals with the Knuth elements required for text justification, that is Knuth elements generated for elastic spaces.

The following shorthand are used in the sample sequences:
 * spb-start = the sum of the space-start, border-start and padding-start lengths
 * spb-end = the sum of the space-end, border-end and padding-end lengths
 * sp-width = the width of a nominal space character
 * hyp-width = the width of a hyphenation character

= Commonly occurring Knuth sequences =

== A simple break ==
{{{
1  pen   w="0" p="0"
}}}

== A forced break ==
{{{
1  pen   w="0" p="-INF"
}}}

== Space/Border/Padding around a break ==
A common occurrence at a break is the presence of space/border/padding on one or both sides of a break. The generic Knuth sequence for such a situation is:
{{{
1  glue  w="spb-end"
2  pen   w="0" p="0"
3  glue  w="- (spb-end + spb-start)"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"
}}}
Explanation:
{{{
element 1 is a legal break point, but it is never chosen as 2 is better
element 2 is a legal break point: if it is chosen, the ending line will
   reserve a width of spb-end for border and padding, and the next line will
   reserve a width of spb-start (the glue 3 is discarded)
element 3 is NOT a legal break because of the preceding penalty
element 4 prevents element 6 to be discarded in case element 2 is chosen as a break
element 5 is NOT a legal break because of its value
element 6 is NOT a legal break because of the preceding penalty
if there is no break, the overall width is spb-end + (-(spb-end + spb-start)) + spb-start
}}}

== Alignments ==
=== Center alignment ===
For center alignment (text-align="center") a constant stretch is added both sides of the break:
{{{
1  glue  w="0" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="0" stretch="- 6 * sp-width" shrink="0"
4  pen   w="0" p="INF"
5  glue  w="0" stretch="3 * sp-width" shrink="0"
}}}

=== Left/right alignment ===
For left or right alignment (text-align="left" or text-align="right") a constant stretch is added at the end of the line:
{{{
1  glue  w="0" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="0" stretch="- 3 * sp-width" shrink="0"
}}}

== Space/Border/Padding combined with Alignments ==
=== Space/Border/Padding with Center alignment ===
{{{
1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="- (spb-end + spb-start)" stretch="- 6 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start" stretch="3 * sp-width" shrink="0"
}}}

=== Space/Border/Padding with Left/Right alignment ===
{{{
1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="- (spb-end + spb-start)" stretch="- 3 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"
}}}


= Specific Knuth sequences =
The following cases have been identified:

1. Non breaking / non elastic
 Example: U+202F NARROW NO-BREAK SPACE
 
This is actually the normal character case but can contain some characters Unicode classifies as space. A consecutive sequence of non breaking / non elastic characters with the same properties is mapped into a single Knuth box element with the combined width of all the characters. It is important to aggregate and not to generate individual box elements so that kerning can be taken into account.
{{{
1  box w="<width of sequence>"
}}}
/!\ These box elements are not related to the identification of words in the text required by the hyphenation subsystem.

For example:
{{{
<fo:inline font-size="2em">B</fo:inline>argain
}}}
would generate:
{{{
1  box   w="width of 'B'"
2  box   w="width of 'argain'"
}}}
However, the hyphenation algorithm would need to be given the word: Bargain.

2. Non breaking / elastic space 
 Example: U+00A0 Non breaking space

For this character class the Knuth elements must prevent that a break is generated but they still participate in text justification.
 /!\ If a character falls into this class or not depends on the combination of the treat-as-word-space property and its Unicode value.

The Knuth sequence for text-align not equal to "justify":
{{{
1  pen   w="0" p="INF"
2  glue  w="sp-width" stretch="0" stretch="0"
}}}
and for text-align="justify":
{{{
1  pen w="0" p="INF"
2  glue w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
}}}
 /!\ The width, stretch and shrink values above do depend on the word-spacing property.

3. Break / non elastic

 Example: U+200B Zero Width Space

This type involves all break possibilities which don't add, remove or change any characters. However, when a break is generated border and padding must be taken into account as must certain text-align values. These sequences are identical to the generic sequences mentioned above.
 /!\ In addition a change in width due to kerning may need to be considered.

4. Break / non elastic / add character if break
 Example: Hyphenation

The Knuth solution if something needs to be added to the end of the line when a break is generated is to assign a non zero width to the penalty for the break. For hyphens the penalty will also be flagged (given a non zero value):
{{{
1  pen   w="hyp-width" p="FLAGGED"
}}}
This can be easily combined with the common sequences for Space/Border/Padding and/or alignment. For example the Knuth sequence for a break possibility with a hyphen for Space/Border/Padding and text-align="center" would be:
{{{
1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="hyp-width" p="FLAGGED"
3  glue  w="- (spb-end + spb-start)" stretch="- 6 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start" stretch="3 * sp-width" shrink="0"
}}}
 /!\ This doesn't cater for change in spelling or kerning in the presence of hyphenation.

5. Break / non elastic / remove if not break
 Example: U+00AD Soft hyphen

As a these characters have a zero width in the non break situation they behave with respect to the Knuth sequences identical to the hyphenation case above.

6. Break / non elastic / removable
 Example: U+2000 EN QUAD and other fixed width spaces

The Knuth algorithm removes all glue elements at the beginning of the line therefore this sequence will do the trick:
{{{
1  pen   w="0" p"=0"
2  glue  w="char width"
}}}
Again this can be combined with Space/Border/Padding and alignment as this example for text-align="left/right" shows:
{{{
1  glue  w="spb-end" stretch="3 * sp-width" shrink="0"
2  pen   w="0" p="0"
3  glue  w="char width - (spb-end + spb-start)" stretch="- 3 * sp-width" shrink="0"
4  box   w="0"
5  pen   w="0" p="INF"
6  glue  w="spb-start"
}}}
 /!\ XSL-FO does not define these characters as removable white space but 
would under common typesetting conventions these be removed at a line break?

7. Break / elastic / non removable
 Example: U+3000 Ideographic space

This can be handled like a combination of a non breaking space (case 2.) followed by a zero width space (case 3.). For example text-align="justify" with Space/Border/Padding:
{{{
1  pen   w="0" p="INF"
2  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
3  glue  w="spb-end"
4  pen   w="0" p="0"
5  glue  w="- (spb-end + spb-start)"
6  box   w="0"
7  pen   w="0" p="INF"
8  glue  w="spb-start"
}}}
 /!\ XSL-FO does not define U+3000 as removable white space but 
would under common CJK typesetting conventions this be removed at a 
line break?
 /!\ Unicode does not break before a space as it assumes spaces are removed from the end of a line. This is not the case here. Do we need to allow for a break before?

8. Break / elastic / removable
 Example: U+0020 Space

If white-space-collapse="false" and white-space-treatment="ignore..." we can have a situation that there is a run of spaces which must be removed if a break is generated. Assuming each space generates its own glue element (or at least we may have multiple glue elements if the spaces cross fo boundaries) we get sequences similar to case 6 in the simplest case:
{{{
1  pen   w="0" p"=0"
2  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
3  pen   w="0" p"=INF"
4  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
...
n-1 pen  w="0" p"=INF"
n  glue  w="sp-width" stretch="sp-width / 2" shrink="sp-width / 3"
}}}
Again this can be combined with the common Space/Border/Padding and/or alignment sequences.

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-commits-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-commits-help@xmlgraphics.apache.org