You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Dario Laera <la...@cs.unibo.it> on 2008/11/03 10:30:12 UTC

Re: Choosing a better threshold in line breaking

Il giorno 28/ott/08, alle ore 13:53, Vincent Hennebert ha scritto:

> If you could run statistics on more real-life documents (how often is
> the first run without hyphenation sufficient, the third run required,
> justified and left-aligned text, single / two-column on A4 paper,  
> etc),
> that would be fantastic.

I've run the examples in the repository with some debug info, you can  
find the refined output in the attachment. The interesting output  
lines are those with high "lines" value (to see when long paragraphs  
becomes difficult to break) and those following two consecutive "RETRY".
hyphen.fo was the most interesting case: it clearly states that even  
for medium paragraph (10 lines) th=1.0 plus hyphenation is not enough.  
This is a bit language dependent: italian paragraphs don't need to  
increase the threshold, I think this is due  to the fact that italian  
lang allows for more hyphenation points than other langs like english,  
but I think we shouldn't care about this issue. I tried then to format  
hyphen.fo using at the second try th=5.0, and it was always enough  
regardless of the alignment. Finally, I've formatted the same fo with  
hyphenation disabled and the result was mixed: sometimes the third  
attempt was necessary, some others not.
The franklin*.fo files contains paragraphs longer than hyphen.fo, but  
with hyphenation disabled, so those paragraphs gets broken at the  
second attempt even if they are start-aligned.
In inhprop.fo a center-aligned non-hyphenated paragraph 4 lines long  
fall down in the forced mode, changing the alignment would make the  
third attempt unnecessary.

The results of these tests can be summarized as follows:
  * non-hyphenated paragraphs are handled efficiently for both justify  
and start alignment as the second attempt is usually sufficient  
(steps: 1.0, 5.0, 20.0);
  * hyphenated paragraphs should benefit from a th=5.0 attempt that  
isn't performed (steps: 1.0, 1.0 + hyph, 20.0 + hyph);
  * center-aligned mid/long sized paragraphs are likely to need  
threshold higher than 5.0.

If you have a typical user xsl-fo file which behavior is worth to be  
examined send it to me, please.

Dario