You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by "J.Pietschmann" <j3...@yahoo.de> on 2003/11/12 22:45:37 UTC

RT: line breaking

Victor Mote wrote:
> I know of at least two line-breaking strategies that we probably want to
> have in our stock strategies: 1) the line-by-line method used right now, and
> 2) a Tex-like paragraph-oriented strategy, which AFAIK doesn't exist yet.

Ahem, that's not what I meant, or the scope of UTR14. UTR14 provides for
"line break opportunities", for example you can break foo-bar after the
hyphen but not 789-123. Which opportunities are used is another matter.
FOP's current algorithm for determining line break opportunities is utterly
simplistic, basically "possibly break before any breaking space, or after
a hyphen or slash", the latter is done if hyphenation is enabled.

I omitted the forced line break issue, which is also in the UTR14 scope,
and hyphenation, which may lead to additional line break opportunities
but is outside of the UTR14 scope.

> In your URL example, couldn't FOP see the "x-url" language & automatically
> add or assume the glue characters for the user? That would perhaps make it
> less obtrusive (I assume that you meant for the user).

Well, yes.

> I don't see it there yet, but I am a little confused. It seems to me that
> line-breaking consists of at least these components: 1) character-based
> line-breaking opportunities (which UTR14 addresses), 2) word-based
> line-breaking opportunities (which hyphenation dictionaries and patterns
> address), and 3) some strategy for using these to find acceptable/optimal
> line breaks. It sounds like you have addressed at least 1 and 3 in your
> implementation.

Paragraph filling (your point 3) is not addressed.
Be careful with the various TRs: UTR14 does not deal with character
(rather: grapheme) or word boundaries, that's UTX-29. Actually, we
don't use the latter.
Our line breaking should probably be done the following way (this
implements the "naive" paragraph filling strategy)
   loop
     calculate line width if next character is added
     check for a line breaking opportunity before the next character
     if there is an opportunity
       if the line is not full
         discard the last saved opportunity and save this
       else
         try hyphenation on the string accumulated since the
           last break opportunity (if enabled), save returned
           opportunity if any
         return saved line breaking opportunity
       end if
     end if
   end loop

hyphenation of a string:
  loop
    skip non-word characters (for this hyphenator)
    word = continuous run of word characters (for this hyphenator)
    if the end of the word is past the end of the line
      try hyphenating the word, generate new break opportunities
      return best fitting line break opportunity or null
    end if
  end loop

There is the degenerate case if the line overflows and no line break
opportunity is discovered at all.
The TeX paragraph filling strategy has to detect line break opportunities
the same way but selects the opportunities turning into actual line breaks
in a more clever way. We could do that too.

> This seems at least remotely related to fo.FOText.isWordChar(), which
> attempts to find breaks between words.

Actually, we don't need breaks between words. We need identifying line
breaking opportunities, words for the purpose of hyphenation, and
resizable spaces for justification.
That's why WordArea was such a bad name.

J.Pietschmann

Re: RT: line breaking

Posted by "Peter B. West" <pb...@powerup.com.au>.

J.Pietschmann wrote:
> Be careful with the various TRs: UTR14 does not deal with character
> (rather: grapheme) or word boundaries, that's UTX-29. Actually, we
> don't use the latter.
> Our line breaking should probably be done the following way (this
> implements the "naive" paragraph filling strategy)
>   loop
>     calculate line width if next character is added
>     check for a line breaking opportunity before the next character
>     if there is an opportunity
>       if the line is not full
>         discard the last saved opportunity and save this
>       else
>         try hyphenation on the string accumulated since the
>           last break opportunity (if enabled), save returned
>           opportunity if any
>         return saved line breaking opportunity
>       end if
>     end if
>   end loop
> 
> hyphenation of a string:
>  loop
>    skip non-word characters (for this hyphenator)
>    word = continuous run of word characters (for this hyphenator)
>    if the end of the word is past the end of the line
>      try hyphenating the word, generate new break opportunities
>      return best fitting line break opportunity or null
>    end if
>  end loop
> 
> There is the degenerate case if the line overflows and no line break
> opportunity is discovered at all.
> The TeX paragraph filling strategy has to detect line break opportunities
> the same way but selects the opportunities turning into actual line breaks
> in a more clever way. We could do that too.

In my own thinking about the process of line-breaking, I have always 
assumed that a (possibly recursive) block of text is a fixed resource; a 
superset of the fixed resource that is a single glyph/grapheme with 
given font attributes.  As such, it should be processed by a separate 
co-routine (to use the language of the Rec).  All of the information 
about the hierarchy of potential break positions is determined by the 
text itself.

As a first cut, I would I would determine all potential breaks, along 
with information relevant to later line-height calculations, at the time 
  a block is first prepared for layout.  The co-routine (thread, 
whatever) that is grooming the text would then respond to enquiries 
about line-area possibilities, and eventually return contents for 
line-areas of particular dimensions.  All of this is tentative, and all 
of the calculated information about the block would have to be held 
until the layout of the block is finalised.

What "finalised" means depends on the complexity of the layout 
strategies employed, but at a minimum, it must be maintained until the 
last page containing text from the block, and the subsequent page (if 
any) have been laid out, to allow for backtracking during last-page 
processing.

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>