You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Manuel Mall <mm...@arcus.com.au> on 2005/11/07 08:24:14 UTC

Is getNextKnuthElements the right interface for inline LMs?

As you know I am looking into the white space handling and this has now 
expanded into Unicode linebreaking, handling of Unicode formatting 
characters (e.g. ZWSP), get a handle on all the different break 
scenarios and their related Knuth sequences, Joerg threw glyph 
merging / substitution into the mix, and then we have l-r writing modes 
and BIDI.

What I observed is that most of these issue cannot be solved by looking 
at a single character at a time. They need context, very often only one 
character, sometimes more (e.g. sequence of white space). More 
importantly the context needed is not limited to the fo they occur in. 
They all span across fos. This is were the current LM structures and 
especially the getNextKnuthElement interface really gets in the way of 
things. Basically one cannot create the correct Knuth sequences without 
the context but the context can come from everywhere (superior fo, 
subordinate fo, or neighboring fo). So one needs look ahead and 
backtrack features across all these boundaries and it feels extremely 
messy.

It appears conceptually so much simpler to have only a single loop 
interating over all the characters in a paragraph doing all the 
character/glyph manipulation, word breaking (hyphenation), and line 
breaking analysis and generation of the Knuth sequences in one place. 
An example where this is currently done is the white space handling 
during refinement. One loop at block level based on a recursive char 
iterator that supports deletion and character replacement does the job. 
Very simple and easy to understand. I have something similar in mind 
for inline Knuth sequence generation. Of course the iterator would not 
only return the character but relevant formatting information for it as 
well, e.g. the font so the width etc. can be calculated. The iterator 
may also have to indicate start/end  border/padding and conditional 
border/padding elements.

Of course that would be quite a change internally although limited to 
inline LMs and not affecting any block level operations. The way to do 
this would be a branch in svn. But before I embark on such an endeavour 
I'll like to seek some feedback on the list. Anyone aware of serious 
problems with such an approach? Has it been tried before and failed for 
example? Those who designed the current getNextKnuth approach may have 
arguments why changing it for inline LMs is a bad idea? Any other 
views / concerns?

Thanks

Manuel

Re: Is getNextKnuthElements the right interface for inline LMs?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 07.11.2005 08:24:14 Manuel Mall wrote:
<snip/>
> Of course that would be quite a change internally although limited to 
> inline LMs and not affecting any block level operations. The way to do 
> this would be a branch in svn. But before I embark on such an endeavour 
> I'll like to seek some feedback on the list. Anyone aware of serious 
> problems with such an approach?

No.

> Has it been tried before and failed for example?

We had to change a few things during the transition to the Knuth
approach. Sometimes, changes are necessary and it makes no sense to
stubbornly stick to what is already there. 

> Those who designed the current getNextKnuth approach may have 
> arguments why changing it for inline LMs is a bad idea?

I have none. You seem to have good arguments for changing the interface.
Still, care should be taken that the LMs stay as uniform as possible so
it's possible to add layout managers for custom elements and that
non-character content is handled well and without too much custom logic
because the changed approach focuses strongly on text.

> Any other views / concerns?

The above said, it should be noted that I haven't dived, yet, into the
Unicode stuff you've been discussing lately. I'm very happy about the
flurry of activity in this area. It looked like a good discussion. I
hope you will excuse me if I don't participate too much there right now.


Jeremias Maerki