You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-commits@xmlgraphics.apache.org by Apache Wiki <wi...@apache.org> on 2006/06/01 17:06:09 UTC

[Xmlgraphics-fop Wiki] Update of "GoogleSummerOfCode2006/FloatsImplementationProgress" by VincentHennebert

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" for change notification.

The following page has been changed by VincentHennebert:
http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006/FloatsImplementationProgress

The comment on the change is:
Pagination of Complex Documents: theoretical background

------------------------------------------------------------------------------
   * the line-breaking algorithm isn't designed to handle a floating sequence of g/b/p items. That is, a subsequence that is not fixed in the larger general sequence, but may be put at other places and possibly splitted.  The algorithm will have to be extended, probably by the algorithm found in "Pagination Reconsidered";
   * All what have been stated so far may be invalidated by further readings...
  
+ == "On the Pagination of Complex Documents" ==
+ From Brüggeman-Klein, Klein, Wohlfeil. This article may be found [http://wwwpi6.fernuni-hagen.de/Publikationen/tr234.pdf here].
+ Also, from the same authors, the article [http://wwwpi6.fernuni-hagen.de/Publikationen/tr205.pdf "Pagination Reconsidered"] gives the details of the algorithm.
+ 
+ === Some Theoretical Background ===
+ This article let me understand some abstract background of document layout. In fact, document layout is somewhat similar to a [http://en.wikipedia.org/wiki/Bin_packing_problem bin packing problem]: bins correspond to pages, objects to typeset material of some height. Those objects usually come from different flows: before-floats, main flow, footnotes. The order of each flow must be respected (a figure appearing after another one in the list must appear later on the pages), but there is some freedom in choosing from which flow the next typeset element will be taken.
+ 
+ An optimal way of putting those elements on the pages must be found. In other words there is some cost function to minimize. This cost function should reflect what a human would call a good pagination; among various criteria we may cite the following: figures should appear as near to their first reference as possible; footnotes ''must'' appear on the same page as their citation; facing pages (or columns) should be balanced; etc.
+ 
+ As for packing problems, page layout will usually be a [http://en.wikipedia.org/wiki/Combinatorics combinatorial] NP-hard problem. However, if the cost function has some particular properties (namely [http://en.wikipedia.org/wiki/Optimal_substructure optimal substructure]), it becomes possible to solve it in polynomial time by using [http://en.wikipedia.org/wiki/Dynamic_programming dynamic programming]. The cost function used by Knuth and Plass for computing demerits of each line of a paragraph has such a property. 
+ 
+ For the page layout problem, the main issue is then to find a cost function:
+  * which correctly reflects the quality of a layout;
+  * which has the right properties to make it possible to use dynamic programming for minimizing it.
+ 
+ Note that other techniques than dynamic programming may be used to layout pages. For example, by using some heuristic approach combined to genetic algorithms or simulated annealing (like in [http://citeseer.ist.psu.edu/johari97automatic.html this article]). But exploring those approaches would lead us much too far. Let's concentrate on the dynamic programming approach implemented in Fop.
+ 

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-commits-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-commits-help@xmlgraphics.apache.org