You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Tony Graham <To...@Sun.COM> on 2003/02/03 16:15:17 UTC

RE: source for hz algorithm

Rhett Aultman wrote at 29 Jan 2003 10:33:21 -0500:
 > Any attempt FOP has at laying rules for layout is going to be
 > applying heuristics.  It would be interesting to be able to expose
 > the heuristics and allow for programmer/user control of them.
 > Still, until more of this materializes, I think we're getting
 > famous French mathematicians before the equine species (...getting
 > DesCartes before the horse, or "de cart before the horse"...bad
 > joke)

This is essentially what the Sun xmlroff XSL Formatter will do.

Unfortunately the only information that I can point to right now is my
slides from XML 2002, but after the area tree is built, there will be
an 'Area Tree Adjust' stage (yet to be written) where code in loadable
modules can tweak the area tree within the limits of the .minimum and
.maximum of the various properties in the formatting object tree.

Since different people have different speed-quality priorities, the
design allows choices for layout rules (once the choices are written,
however).

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: source for hz algorithm

Posted by Victor Mote <vi...@outfitr.com>.
Oleg Tkachenko wrote:

> I've got this book too, good one, but too TeX-oriented IMO.

True enough that the book in general is TeX- & Metafont-oriented. However, I
thought the chapter on line-breaking was general enough to be very useful to
us.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: source for hz algorithm

Posted by Oleg Tkachenko <ol...@multiconn.com>.
Victor Mote wrote:
> FYI, I now have the Knuth books "Digital Typography" and the 5-volume
> "Computers & Typesetting", and have perused the relevant sections. The h & j

> For any who are interested in line-breaking, I highly recommend at least
> reading through this material. The book has a lot of other interesting
> things as well, including a chapter (4) on bidi.
I've got this book too, good one, but too TeX-oriented IMO. Funny enough, the 
book was lying on my desk for ages and only now I noted it has a  very 
interesting chpater about line breaking ;)
-- 
Oleg Tkachenko
Multiconn Technologies, Israel


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: source for hz algorithm

Posted by Victor Mote <vi...@outfitr.com>.
FYI, I now have the Knuth books "Digital Typography" and the 5-volume
"Computers & Typesetting", and have perused the relevant sections. The h & j
algorithm that I have been looking for can be found in Chapter 3 of "Digital
Typography". A brief history of line-breaking is included, as well as a
discussion of the theory, and the algorithm laid out in pretty good detail.
The mathematics of the solution will appeal to Rhett and probably several
others. The performance penalty is not so great as I thought. Here is an
excerpt:

<-- Start -->
But how is the computer to solve such a problem efficiently? When a given
paragraph has n optional breakpoints, there are 2^n ways to break it into
lines, and even the fastest conceivable computers could not run through all
such possibilities in a reasonable amount of time. In fact, the job of
breaking a paragraph into equal-sized lines as nicely as possible sounds
suspiciously like the infamous bin-packing problem, which is well known to
be NP-complete. Fortunately, however, each line will consist of contiguous
information from the paragraph, so the line-breaking problem is amenable to
the techniques of discrete dynamic programming; hence there is a reasonably
efficient way to attack it. We shall see that the optimum breakpoints can be
found in practice with only about twice as much computation as needed by the
traditional algorithm; the new method is sometimes even faster than the old,
when we consider the time saved by not needing to hyphenate so often....
<-- End -->

For any who are interested in line-breaking, I highly recommend at least
reading through this material. The book has a lot of other interesting
things as well, including a chapter (4) on bidi.

I'm hoping to be back into doing real work within a few days.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: source for hz algorithm

Posted by Victor Mote <vi...@outfitr.com>.
Victor Mote wrote:

> Ek is actually hacking the font shapes. I have seen this or something
> similar done in FrameMaker, which gives the ability to stretch or condense
> glyphs on both the horizontal and vertical axes. However, FrameMaker also
> creates PDFs by generating Postscript code and Distilling it, so perhaps
> this is done in Postscript & embedded. I will have to look into this
> further.

I see now the Tz operator in PDF 1.4 (probably earlier versions as well)
that handles this. I am not entirely sure this is the same thing, but I
don't see why not.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: source for hz algorithm

Posted by Victor Mote <vi...@outfitr.com>.
FYI, I received the Seybold Report on Publishing Systems (Vol 22, No. 11) a
few days ago & have read the article on hz-program. The article was
published in 1993, and is pretty print-oriented. If memory serves, this was
before PDF & electronic paper really caught on -- this is important because
there are some things that might be implemented in a RIP, that may not be
supported in PDF or other file formats. The article shed a little more light
on the subject. Basically, there are four components:
  1. kf -- kerning on the fly (including hanging punctuation & serifs)
  2. Kp or Kr (the second letter is the Greek letter rho) -- optical scaling
of fonts
  3. Ek -- slightly expands or condenses individual character widths to
reduce the need for word spacing
  4. jp -- justification per paragraph. This is the part that is based on
TeX.

The problems with kerning on the fly (ie. computing kerning instead of
looking it up in a table) are interesting. The article cited the example of
a "C" swallowing a "-" that followed it. It seems like what is needed is a
way to designate in the font an optical boundary for a character. For
example, the capital C might look (approximately) like this:


       /-----.
      |     .
      |    .
      |    .
      |     .
       \-----.

where the periods on the right side represent an artificial optical boundary
(perhaps it needs to more or less concave). Perhaps the boundary should be
at least line/arc segments on the left and right, or I suppose it could be a
completely enclosed structure. I don't see anything in the OpenType spec for
this, but it seems to me that the information could be computed, although
the mathematics might be interesting.

I /think/ that much of what Kp did is now built into OpenType fonts -- i.e.
intelligent adjustment of stroke weights based on point size.

Ek is actually hacking the font shapes. I have seen this or something
similar done in FrameMaker, which gives the ability to stretch or condense
glyphs on both the horizontal and vertical axes. However, FrameMaker also
creates PDFs by generating Postscript code and Distilling it, so perhaps
this is done in Postscript & embedded. I will have to look into this
further.

We have already discussed the jp-type functions some.

One real trick here is to find an algorithm that will tie all of this
together into a useable state. Should we have yet another wiki to outline
all of this? Or is it premature to even discuss it much? Also, perhaps some
of it ties in with the wiki on resolution of breaks.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org