You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2004/08/03 14:41:52 UTC

DO NOT REPLY [Bug 29124] - New line breaking algorithm

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=29124>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=29124

New line breaking algorithm





------- Additional Comments From lfurini@cs.unibo.it  2004-08-03 12:41 -------

Hi all

At long last, I have finished the patch implementing Knuth's line breaking 
algorithm; it took me more than I expected, mainly because of a long sequence 
of hw and sw troubles ... Murphy's laws are not something to laugh at! :-)

I have worked on [Line, Text, InlineStacking, LeafNode]LM, so the algorithm 
should work well with any fo file containing text, leaders, characters, 
inlines and the other formatting objects handled by a LeafNodeLM (external 
graphics, pagenumbers and citations).

The general idea of Knuth algorithm is:

  try to find breaking points without hyphenating words
  if this fails
    hyphenate all words
    try again

The "hyphenate all words" phase could be time-expansive, so this step is 
performed trying to use as much as possible the information already known, and 
to minimize the changes to the existing sequence of elements.
The old sequence is used to collect word fragments, and elements are replaced 
only if the LM which created them has something to change.
So, "hyphenate all words" means:

  scan the old sequence once:
    collect word fragments
    hyphenate word
  scan the old sequence once more:
    if the LM which returned this element has changed something
      replace all elements returned by this LM
 
These are the new methods added; at the moment, I added them to the 
LayoutManager interface, but maybe it could be better to create a new 
interface implementd only by LM returning inline areas.

+ getNextKnuthElement()
This is used instead of getNextBreakPoss().
The next step (I have already started working on it) would be to use the same 
method all LM use.

+ addALetterSpaceTo()
The low-level LMs (TLMs, LNLMs) have only a partial view of the text, and 
therefore cannot know the exact number of letter spaces, while the LineLM has 
a full view.
If a TLM's text is "Tex", it can only suppose it has 2 letter spaces; if the 
following formatting object is a character "t", the LineLM tells the TLM to 
add a letter space, as the "x" is not the last letter of the word.

+ getWordChars()
This is not a new method, it just has different parameters; text is collected 
from fo:characters too.

+ hyphenate()
The TLM does not apply the changes to vecAreaInfo immediately, otherwise the 
existing Position objects stored in the old sequence couldn't be used any 
more. The LeafNodeLM returns a single area, so it can apply changes immediatly.

+ applyChanges()
This method tells the TLM to apply the changes to vecAreaInfo; all LM returns 
true if something is changed or false otherwise, so the LLM knows whether it 
has to replace the old elements or not.

+ getChangedKnuthElement()
This is used by the LLM to obtain the new elements.

+ getWordSpaceIPD()
This is used by the LLM to ask for the word space dimension; the LLM needs it 
to center text.

A few details to fix:

- word spacing and letter spacing are now fully implemented, they can both 
have MinOptMax values; but I am still thinking about how to differentiate a 
user-defined zero value from a default zero value ...

- Leaders with leader-pattern = "rule" or "space" work well; with "dots" the 
space left is right, but the dots don't fill it properly. Leaders with leader-
pattern="use-content" don't work, as the ContentLayoutManager has at the 
moment only a null implementation of the method getNextKnuthElement.
There is also a minor bug concerning (IMO) white space handling: if there 
white space both before and after the leader, the latter one is removed, so 
instead of
  word ______ word
the output shows
  word ______word

- with the other fo elements (fo:externalgraphic, fo:page-number and fo:page-
number-citation) the LeafNodeLM behave exactly the same way as with the old 
code, i.e. a fo:page-number-citation generates a "?  ".

- text-align-last is partially implemented; text-align-last = "justify" works 
only if text-align = "justify" too; this is because Knuth's algorithm doesn't 
provide for a different alignment for the last line.

I'm going to attach:
- the patch to existing files and new files
- a test fo file

Re: [Bug 29124] - New line breaking algorithm

Posted by Simon Pepping <sp...@leverkruid.nl>.

On Tue, Aug 03, 2004 at 12:41:52PM -0000, bugzilla@apache.org wrote:
> ------- Additional Comments From lfurini@cs.unibo.it  2004-08-03 12:41 -------
> These are the new methods added; at the moment, I added them to the 
> LayoutManager interface, but maybe it could be better to create a new 
> interface implementd only by LM returning inline areas.

I think it is, but that can come later.

> - word spacing and letter spacing are now fully implemented, they can both 
> have MinOptMax values; but I am still thinking about how to differentiate a 
> user-defined zero value from a default zero value ...

You cannot. A default value is a user-defined value supplied by the
system to save the user the trouble of always having to enter a
value. It is just a convenience, and you cannot attach a different
meaning to it.

> - text-align-last is partially implemented; text-align-last = "justify" works 
> only if text-align = "justify" too; this is because Knuth's algorithm doesn't 
> provide for a different alignment for the last line.

TeX uses glue to achieve this, \parfillskip. It is the special amount
of glue appended to the last line. In the TeXbook, p. 99, Knuth
describes it as 'the special trick that allows the final line of a
paragraph to be shorter than the others'. Setting \parfillskip to 0
removes this ability. Usually \parfillskip has infinite
stretchability.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl

[Bug 29124] - New line breaking algorithm

Posted by Simon Pepping <sp...@leverkruid.nl>.

On Tue, Aug 03, 2004 at 12:41:52PM -0000, bugzilla@apache.org wrote:
> ------- Additional Comments From lfurini@cs.unibo.it  2004-08-03 12:41 -------
> 
> Hi all
> 
> At long last, I have finished the patch implementing Knuth's line breaking 
> algorithm; it took me more than I expected, mainly because of a long sequence 
> of hw and sw troubles ... Murphy's laws are not something to laugh at! :-)

That is great. Congratulations with the successful completion of this
work. I will look at it soon, but since I will have holidays, it will
take a bit longer. I am curious to see how you negotiated all the
descendant LMs. 

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl