You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Luca Furini <lf...@cs.unibo.it> on 2004/05/19 12:08:50 UTC

Justification and line breaking

   Hi all

I am still thinking about justification and the more general problem of
line-breaking, and I have come to think that it's quite "strange" that the
LineLayoutManager should make choices about breaking points using only the
information provided by the TextLayoutManagers, while it should have a
wider knowledge of all the text.
(I see bug 28706 as an example of this strangeness: the LLM wants the TLM
to say if there is other text after the returned BreakPoss, but the TLM
doesn't know of the other TLMs' text)

At the moment, lines are built one at a time, and in "normal" cases only
underfull lines are taken into account: as both bpDim and availIPD have
.min == .opt == .max, no BreakPoss is added to vecPossEnd and the chosen
one is simply the last "short" BP returned by a TLM.
Even if bpDim had .min != .max, the choice would be made between a few
alternatives for the current line, without considering what will happen
next; this could generate an output alternating tight and loose lines,
which is not very beautiful.

So, I have tried to implement Knuth's line-breaking algorithm [1], which
calculates breaking points after having gathered information about a whole
paragraph.
Here are a few advantages of this algorithm:
- first of all, the output is very beautiful; there is not a big
  difference in width between spaces in consecutive lines, and the max
  space width is smaller than before
- the interaction between LLM and TLM is quite the same; the TLM returns a
  different kind of objects, much smaller
- the TLM code is simplified a bit, as it has no more to handle leading
  spaces, or calculate flags (which IMO are rather line-related than
  text-related)
- the LLM now can quite easily handle properties such as text-indent,
  text-align-last, word-spacing and letter-spacing

Could I open a bugzilla issue and attach a patch? It would be quite a raw
patch, as I took some short cuts to make it work and there could be some
useless variables, anyway it works and could be used to show the quality
of the output. I have tested it with text-only blocks, so I don't know
what could happen in more complex situations.

Regards
    Luca

[1] D. E. Knuth and M. F. Plass, "Breaking paragraphs into lines"; I found
this essay in D. E. Knuth, "Digital typography", published by CSLI
Publications

Re: Justification and line breaking

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Peter B. West wrote:
> Do you know of a web-accessible version of the paper, or summary of the 
> algorithm?

Try the TeX book, available as TeX-source from your nearest
CTAN server. The description is, umm, somewhat obscure, you
should get the commented TeX source (the .web files) as well.

J.Pietschmann

Re: Justification and line breaking

Posted by "Peter B. West" <pb...@tpg.com.au>.

Luca,

Do you know of a web-accessible version of the paper, or summary of the 
algorithm?

Peter

Luca Furini wrote:
>    Hi all
> 
> I am still thinking about justification and the more general problem of
> line-breaking, and I have come to think that it's quite "strange" that the
> LineLayoutManager should make choices about breaking points using only the
> information provided by the TextLayoutManagers, while it should have a
> wider knowledge of all the text.
> (I see bug 28706 as an example of this strangeness: the LLM wants the TLM
> to say if there is other text after the returned BreakPoss, but the TLM
> doesn't know of the other TLMs' text)
> 
> At the moment, lines are built one at a time, and in "normal" cases only
> underfull lines are taken into account: as both bpDim and availIPD have
> .min == .opt == .max, no BreakPoss is added to vecPossEnd and the chosen
> one is simply the last "short" BP returned by a TLM.
> Even if bpDim had .min != .max, the choice would be made between a few
> alternatives for the current line, without considering what will happen
> next; this could generate an output alternating tight and loose lines,
> which is not very beautiful.
> 
> So, I have tried to implement Knuth's line-breaking algorithm [1], which
> calculates breaking points after having gathered information about a whole
> paragraph.
> Here are a few advantages of this algorithm:
> - first of all, the output is very beautiful; there is not a big
>   difference in width between spaces in consecutive lines, and the max
>   space width is smaller than before
> - the interaction between LLM and TLM is quite the same; the TLM returns a
>   different kind of objects, much smaller
> - the TLM code is simplified a bit, as it has no more to handle leading
>   spaces, or calculate flags (which IMO are rather line-related than
>   text-related)
> - the LLM now can quite easily handle properties such as text-indent,
>   text-align-last, word-spacing and letter-spacing
> 
> Could I open a bugzilla issue and attach a patch? It would be quite a raw
> patch, as I took some short cuts to make it work and there could be some
> useless variables, anyway it works and could be used to show the quality
> of the output. I have tested it with text-only blocks, so I don't know
> what could happen in more complex situations.

-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>

Re: Justification and line breaking

Posted by Simon Pepping <sp...@leverkruid.nl>.

On Wed, May 19, 2004 at 12:08:50PM +0200, Luca Furini wrote:
> 
>    Hi all
> 
> So, I have tried to implement Knuth's line-breaking algorithm [1], which
> calculates breaking points after having gathered information about a whole
> paragraph.
> Here are a few advantages of this algorithm:
> - first of all, the output is very beautiful; there is not a big
>   difference in width between spaces in consecutive lines, and the max
>   space width is smaller than before
> - the interaction between LLM and TLM is quite the same; the TLM returns a
>   different kind of objects, much smaller
> - the TLM code is simplified a bit, as it has no more to handle leading
>   spaces, or calculate flags (which IMO are rather line-related than
>   text-related)
> - the LLM now can quite easily handle properties such as text-indent,
>   text-align-last, word-spacing and letter-spacing

Wow. No question that this is a desirable thing to have. I thought
that it would be several releases away before we could address this.
But if you have a viable solution, by all means show it to us.

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl

Re: Justification and line breaking

Posted by Chris Bowditch <bo...@hotmail.com>.

Luca Furini wrote:

> I am still thinking about justification and the more general problem of
> line-breaking, and I have come to think that it's quite "strange" that the
> LineLayoutManager should make choices about breaking points using only the
> information provided by the TextLayoutManagers, while it should have a
> wider knowledge of all the text.
> (I see bug 28706 as an example of this strangeness: the LLM wants the TLM
> to say if there is other text after the returned BreakPoss, but the TLM
> doesn't know of the other TLMs' text)

bug 28706 is still a bit of mystery to me, well at least the disappearing 
text, as I dont have an example of it.

> At the moment, lines are built one at a time, and in "normal" cases only
> underfull lines are taken into account: as both bpDim and availIPD have
> .min == .opt == .max, no BreakPoss is added to vecPossEnd and the chosen
> one is simply the last "short" BP returned by a TLM.
> Even if bpDim had .min != .max, the choice would be made between a few
> alternatives for the current line, without considering what will happen
> next; this could generate an output alternating tight and loose lines,
> which is not very beautiful.
> 
> So, I have tried to implement Knuth's line-breaking algorithm [1], which
> calculates breaking points after having gathered information about a whole
> paragraph.
> Here are a few advantages of this algorithm:
> - first of all, the output is very beautiful; there is not a big
>   difference in width between spaces in consecutive lines, and the max
>   space width is smaller than before
> - the interaction between LLM and TLM is quite the same; the TLM returns a
>   different kind of objects, much smaller
> - the TLM code is simplified a bit, as it has no more to handle leading
>   spaces, or calculate flags (which IMO are rather line-related than
>   text-related)
> - the LLM now can quite easily handle properties such as text-indent,
>   text-align-last, word-spacing and letter-spacing
> 
> Could I open a bugzilla issue and attach a patch? It would be quite a raw
> patch, as I took some short cuts to make it work and there could be some
> useless variables, anyway it works and could be used to show the quality
> of the output. I have tested it with text-only blocks, so I don't know
> what could happen in more complex situations.

this sounds like a really good idea, and would be very pleased if you could 
open a new bug in bugzilla and attach your patch. It will probably need a 
lengthy review involving plenty of testing and cleaning up.

Chris