You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Manuel Mall <mm...@arcus.com.au> on 2005/11/15 05:56:19 UTC

fo:marker and white space

I was looking at clipping warnings generated by 
examples/fo/markers/hide.fo when I noticed that white space around 
fo:marker seems significant with respect to the output generated when 
the marker is retrieved, e.g.:

<fo:marker>
   <fo:block>
     some text
   </fo:block>
</fo:marker>

when retrieved produces:

<empty line>
some text
<empty line>

while:

<fo:marker><fo:block>some text</fo:block></fo:marker>

just generates:

some text

I am suspicious that this is wrong and both inputs should produce the 
same output.

For a test case and its output see:
http://people.apache.org/~manuel/fop/marker_test.xml
http://people.apache.org/~manuel/fop/marker_test.pdf

Manuel

Re: fo:marker and white space

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Well, my fault. I didn't so much follow the whole whitespace discussion
to know every detail. I assumed there was some kind of consensus by now.

On 16.11.2005 11:15:48 Chris Bowditch wrote:
> Jeremias Maerki wrote:
> 
> > Sounds like a good plan to me. Would you go after that?
> 
> Jeremias: I have similar concerns to Manuel about this. Moving the 
> handleWhitespace method to a different class is probably okay, but I 
> don't think we should start making any major changes to Whitespace 
> handling until we have the design nailed down. It is still unclear 
> exactly what the spec intends in some places. Manuel has written a Wiki 
> which attempts to document the intention of the spec and presents some 
> ideas on how to implement this functionality.
> 
> http://wiki.apache.org/xmlgraphics-fop/LineLayout/WhitespaceHandling
> 
> > 
> > On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> > 
> >>In this respect: I still wonder whether it wouldn't be more  
> >>convenient to split up the whitespace handling, and deal with the  
> >>inlines separately. Currently, InlineCharIterator needs to generate  
> >>boundary characters to indicate start- or end-inline. If we would  
> >>deal with the whitespace of the inlines at inline-level itself, it  
> >>should become far more straightforward to apply the 'special' rules  
> >>(no removal of the first/last space of the inline, or before it).
> >>
> >>On top of that, it does away with the need to chain together all  
> >>FOText instances of a whole block (thus making that ugly static  
> >>'lastFOTextProcessed' obsolete?)
> 
> Chris



Jeremias Maerki


Re: fo:marker and white space

Posted by Chris Bowditch <bo...@hotmail.com>.
Jeremias Maerki wrote:

> Sounds like a good plan to me. Would you go after that?

Jeremias: I have similar concerns to Manuel about this. Moving the 
handleWhitespace method to a different class is probably okay, but I 
don't think we should start making any major changes to Whitespace 
handling until we have the design nailed down. It is still unclear 
exactly what the spec intends in some places. Manuel has written a Wiki 
which attempts to document the intention of the spec and presents some 
ideas on how to implement this functionality.

http://wiki.apache.org/xmlgraphics-fop/LineLayout/WhitespaceHandling

> 
> On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> 
>>In this respect: I still wonder whether it wouldn't be more  
>>convenient to split up the whitespace handling, and deal with the  
>>inlines separately. Currently, InlineCharIterator needs to generate  
>>boundary characters to indicate start- or end-inline. If we would  
>>deal with the whitespace of the inlines at inline-level itself, it  
>>should become far more straightforward to apply the 'special' rules  
>>(no removal of the first/last space of the inline, or before it).
>>
>>On top of that, it does away with the need to chain together all  
>>FOText instances of a whole block (thus making that ugly static  
>>'lastFOTextProcessed' obsolete?)

Chris



Re: fo:marker and white space

Posted by Manuel Mall <mm...@arcus.com.au>.
On Thu, 17 Nov 2005 03:40 am, Simon Pepping wrote:
> On Wed, Nov 16, 2005 at 08:15:47AM +0800, Manuel Mall wrote:
<snip/>
> linefeed-treatment is a local operation on a single character.
>
Yes

> white-space-collapse does not cross FO boundaries because the spec
> limits this to sibling character FOs.
>
Yes, but

&#x20<fo:character character=" ">

are fo character siblings in the XSL-FO sense but not fop internally. 
The suggestion to move white space handling to inline will not cover 
this case. 

> Only white-space-treatment extends beyond FO boundaries, but its
> treatment in handleWhitespace is only the first stage. At line
> building it needs to be revisited.
>
> That means that theoretically handleWhitespace can be done within
> each FO. 

Not quite - see above - because fops internal distinction between text 
and fo:character.

> But practically it may be better to wait until we have 
> settled ideas about this stage, scanning for linebreak opportunities
> and gathering of Knuth elements.
>

I am still of the opinion we are better off to do as much white space 
handling, including white-space-treatment during refinement. Only 
white-space-treatment around "soft breaks" need to be deferred to the 
line breaking phase during layout.

> Simon

Manuel

Re: fo:marker and white space

Posted by Simon Pepping <sp...@leverkruid.nl>.
On Wed, Nov 16, 2005 at 08:15:47AM +0800, Manuel Mall wrote:
> I have no problems with the suggestion to move the white space handling 
> from Block into its own class so other fo's that need it can make use 
> of it.
> 
> However, I still need to be convinced that pushing it down to inline 
> level is actually of benefit. I am afraid we will end up with the same 
> problem we now have at LM level, that is text for a paragraph needs to 
> be analysed across fo boundaries and the current LM structures are very 
> much in the way of doing that. Whitespace needs to be handled across fo 
> boundaries as well. The current iterator structure was designed to 
> exactly facilitate that. It seems to be doing it well and I see no 
> reason to replace it.

linefeed-treatment is a local operation on a single character.

white-space-collapse does not cross FO boundaries because the spec
limits this to sibling character FOs.

Only white-space-treatment extends beyond FO boundaries, but its
treatment in handleWhitespace is only the first stage. At line
building it needs to be revisited.

That means that theoretically handleWhitespace can be done within each
FO. But practically it may be better to wait until we have settled
ideas about this stage, scanning for linebreak opportunities and
gathering of Knuth elements.

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl


Re: fo:marker and white space

Posted by Andreas L Delmelle <a_...@pandora.be>.
(Sorry for the delayed reply...)

On Nov 16, 2005, at 01:15, Manuel Mall wrote:

> On Wed, 16 Nov 2005 03:45 am, Jeremias Maerki wrote:
>> Sounds like a good plan to me. Would you go after that?
>>

Sure thing. For now, I'll restrict it to moving handleWhitespace()  
into a separate class, maybe one instance for each Flow/StaticContent  
or PageSequence? That instance can then be used by all blocks and  
markers, carrying state info down the tree. FTM, leaving the iterator  
structure unchanged.

> I have no problems with the suggestion to move the white space  
> handling
> from Block into its own class so other fo's that need it can make use
> of it.
>
> However, I still need to be convinced that pushing it down to inline
> level is actually of benefit.

Maybe it's something aesthetic, I dunno. In theory, whitespace  
handling could already be started from the point where you reach the  
first nested start-inline event, why wait until the first start- 
block? A choice... and so we are forced to recurse because previous  
child nodes could contain text themselves.
It simply seems more 'natural' to have each FO handle its own  
whitespace, so the higher level FOs only need to see the first/last  
characters of any child inline nested between their own FOText-nodes.
BTW: what is the maximum number of characters you need in a sequence  
before you can be certain whether a given whitespace should be  
removed/converted? The current implementation seems to indicate that  
number to be two or three. True enough, that's purely XML whitespace  
handling...
But why wait to begin processing until you have, maybe a few hundred  
characters? Since *all* whitespace is passed through by the parser,  
IMO the sooner you can throw excess space characters away, the  
better. Even more so if it's excess fo:character objects.

> I am afraid we will end up with the same
> problem we now have at LM level, that is text for a paragraph needs to
> be analysed across fo boundaries and the current LM structures are  
> very
> much in the way of doing that.

1) Agreed that the LayoutManagers definitely may need more context  
than a handful of characters to make sound decisions. Looking for  
line-breaks, now there we really need to look across FO boundaries
2) There is no inherent contradiction between handling whitespace at  
each block/inline level, and handling whitespace across FO  
boundaries. The latter refers more to the net result of the whole  
algorithm.

--and so, I wonder...

> Whitespace needs to be handled across fo
> boundaries as well. The current iterator structure was designed to
> exactly facilitate that. It seems to be doing it well and I see no
> reason to replace it.

...Hmm. Are the iterators themselves used by layout? AFAICS, that's a  
No. Maybe they are in the wrong package ATM? ;-)
IMO, the current iterator structure, in combination with chaining all  
those FOText instances together, is something that does need to be  
revisited (as in: definitely). Not for an alpha-release, but some  
comments in FOText clearly indicate that it was never the intention  
of keeping it that way. If I get the timeline correctly, the current  
FOText design predates the separation of layout-logic. In the LM- 
tree, a BlockLM needs to be able to see all text-nodes of its  
descendants, but I don't immediately see a reason why a Block in the  
FOTree needs to.

To merge in another part of the thread:

On Nov 17, 2005, at 00:28, Manuel Mall wrote:
> On Thu, 17 Nov 2005 03:40 am, Simon Pepping wrote:
>>
>> linefeed-treatment is a local operation on a single character.
>>
> Yes
>
>> white-space-collapse does not cross FO boundaries because the spec
>> limits this to sibling character FOs.
>>
> Yes, but
>
> &#x20<fo:character character=" ">
>
> are fo character siblings in the XSL-FO sense but not fop internally.
> The suggestion to move white space handling to inline will not cover
> this case.

Not in itself, but it would make it simpler to delete the character  
node if it is done when processing its parent than it is when doing  
so when processing an ancestor X levels up.
WRT whitespace handling during refinement, inlines have more in  
common with markers and blocks than with text-nodes, and fo:character  
is more to be treated like a text-node than an inline.



Cheers,

Andreas


Re: fo:marker and white space

Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 16 Nov 2005 03:45 am, Jeremias Maerki wrote:
> Sounds like a good plan to me. Would you go after that?
>
I have no problems with the suggestion to move the white space handling 
from Block into its own class so other fo's that need it can make use 
of it.

However, I still need to be convinced that pushing it down to inline 
level is actually of benefit. I am afraid we will end up with the same 
problem we now have at LM level, that is text for a paragraph needs to 
be analysed across fo boundaries and the current LM structures are very 
much in the way of doing that. Whitespace needs to be handled across fo 
boundaries as well. The current iterator structure was designed to 
exactly facilitate that. It seems to be doing it well and I see no 
reason to replace it.

Manuel
> On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> > On Nov 15, 2005, at 10:03, Jeremias Maerki wrote:
> >
> > <snip />
> >
> > > The fix is probably to extract handleWhitespace
> > > from Block into a separate class and call it from Block and
> > > Marker.
> >
> > In this respect: I still wonder whether it wouldn't be more
> > convenient to split up the whitespace handling, and deal with the
> > inlines separately. Currently, InlineCharIterator needs to generate
> > boundary characters to indicate start- or end-inline. If we would
> > deal with the whitespace of the inlines at inline-level itself, it
> > should become far more straightforward to apply the 'special' rules
> > (no removal of the first/last space of the inline, or before it).
> >
> > On top of that, it does away with the need to chain together all
> > FOText instances of a whole block (thus making that ugly static
> > 'lastFOTextProcessed' obsolete?)
> >
> > Extracting handleWhitespace() into a separate class would, in any
> > case, be A Good Thing.
> >
> > My 2 cents.
> >
> > Cheers,
> >
> > Andreas
>
> Jeremias Maerki

Re: fo:marker and white space

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Sounds like a good plan to me. Would you go after that?

On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> On Nov 15, 2005, at 10:03, Jeremias Maerki wrote:
> 
> <snip />
> > The fix is probably to extract handleWhitespace
> > from Block into a separate class and call it from Block and Marker.
> 
> In this respect: I still wonder whether it wouldn't be more  
> convenient to split up the whitespace handling, and deal with the  
> inlines separately. Currently, InlineCharIterator needs to generate  
> boundary characters to indicate start- or end-inline. If we would  
> deal with the whitespace of the inlines at inline-level itself, it  
> should become far more straightforward to apply the 'special' rules  
> (no removal of the first/last space of the inline, or before it).
> 
> On top of that, it does away with the need to chain together all  
> FOText instances of a whole block (thus making that ugly static  
> 'lastFOTextProcessed' obsolete?)
> 
> Extracting handleWhitespace() into a separate class would, in any  
> case, be A Good Thing.
> 
> My 2 cents.
> 
> Cheers,
> 
> Andreas



Jeremias Maerki


Re: fo:marker and white space

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Nov 15, 2005, at 10:03, Jeremias Maerki wrote:

<snip />
> The fix is probably to extract handleWhitespace
> from Block into a separate class and call it from Block and Marker.

In this respect: I still wonder whether it wouldn't be more  
convenient to split up the whitespace handling, and deal with the  
inlines separately. Currently, InlineCharIterator needs to generate  
boundary characters to indicate start- or end-inline. If we would  
deal with the whitespace of the inlines at inline-level itself, it  
should become far more straightforward to apply the 'special' rules  
(no removal of the first/last space of the inline, or before it).

On top of that, it does away with the need to chain together all  
FOText instances of a whole block (thus making that ugly static  
'lastFOTextProcessed' obsolete?)

Extracting handleWhitespace() into a separate class would, in any  
case, be A Good Thing.

My 2 cents.

Cheers,

Andreas


Re: fo:marker and white space

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Debugging shows:
The FOText instances under fo:marker (just before and after the fo:block)
don't get processed for whitespace treatment. Block.handleWhitespace
isn't accessible to it. That's why the whitespace isn't removed and
causes additional lines. The fix is probably to extract handleWhitespace
from Block into a separate class and call it from Block and Marker.

So, this is a bug, to be fixed after the release I guess.

On 15.11.2005 05:56:19 Manuel Mall wrote:
> I was looking at clipping warnings generated by 
> examples/fo/markers/hide.fo when I noticed that white space around 
> fo:marker seems significant with respect to the output generated when 
> the marker is retrieved, e.g.:
> 
> <fo:marker>
>    <fo:block>
>      some text
>    </fo:block>
> </fo:marker>
> 
> when retrieved produces:
> 
> <empty line>
> some text
> <empty line>
> 
> while:
> 
> <fo:marker><fo:block>some text</fo:block></fo:marker>
> 
> just generates:
> 
> some text
> 
> I am suspicious that this is wrong and both inputs should produce the 
> same output.
> 
> For a test case and its output see:
> http://people.apache.org/~manuel/fop/marker_test.xml
> http://people.apache.org/~manuel/fop/marker_test.pdf
> 
> Manuel



Jeremias Maerki