You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Manuel Mall <mm...@arcus.com.au> on 2005/11/15 05:56:19 UTC
fo:marker and white space
I was looking at clipping warnings generated by
examples/fo/markers/hide.fo when I noticed that white space around
fo:marker seems significant with respect to the output generated when
the marker is retrieved, e.g.:
<fo:marker>
<fo:block>
some text
</fo:block>
</fo:marker>
when retrieved produces:
<empty line>
some text
<empty line>
while:
<fo:marker><fo:block>some text</fo:block></fo:marker>
just generates:
some text
I am suspicious that this is wrong and both inputs should produce the
same output.
For a test case and its output see:
http://people.apache.org/~manuel/fop/marker_test.xml
http://people.apache.org/~manuel/fop/marker_test.pdf
Manuel
Re: fo:marker and white space
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Well, my fault. I didn't so much follow the whole whitespace discussion
to know every detail. I assumed there was some kind of consensus by now.
On 16.11.2005 11:15:48 Chris Bowditch wrote:
> Jeremias Maerki wrote:
>
> > Sounds like a good plan to me. Would you go after that?
>
> Jeremias: I have similar concerns to Manuel about this. Moving the
> handleWhitespace method to a different class is probably okay, but I
> don't think we should start making any major changes to Whitespace
> handling until we have the design nailed down. It is still unclear
> exactly what the spec intends in some places. Manuel has written a Wiki
> which attempts to document the intention of the spec and presents some
> ideas on how to implement this functionality.
>
> http://wiki.apache.org/xmlgraphics-fop/LineLayout/WhitespaceHandling
>
> >
> > On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> >
> >>In this respect: I still wonder whether it wouldn't be more
> >>convenient to split up the whitespace handling, and deal with the
> >>inlines separately. Currently, InlineCharIterator needs to generate
> >>boundary characters to indicate start- or end-inline. If we would
> >>deal with the whitespace of the inlines at inline-level itself, it
> >>should become far more straightforward to apply the 'special' rules
> >>(no removal of the first/last space of the inline, or before it).
> >>
> >>On top of that, it does away with the need to chain together all
> >>FOText instances of a whole block (thus making that ugly static
> >>'lastFOTextProcessed' obsolete?)
>
> Chris
Jeremias Maerki
Re: fo:marker and white space
Posted by Chris Bowditch <bo...@hotmail.com>.
Jeremias Maerki wrote:
> Sounds like a good plan to me. Would you go after that?
Jeremias: I have similar concerns to Manuel about this. Moving the
handleWhitespace method to a different class is probably okay, but I
don't think we should start making any major changes to Whitespace
handling until we have the design nailed down. It is still unclear
exactly what the spec intends in some places. Manuel has written a Wiki
which attempts to document the intention of the spec and presents some
ideas on how to implement this functionality.
http://wiki.apache.org/xmlgraphics-fop/LineLayout/WhitespaceHandling
>
> On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
>
>>In this respect: I still wonder whether it wouldn't be more
>>convenient to split up the whitespace handling, and deal with the
>>inlines separately. Currently, InlineCharIterator needs to generate
>>boundary characters to indicate start- or end-inline. If we would
>>deal with the whitespace of the inlines at inline-level itself, it
>>should become far more straightforward to apply the 'special' rules
>>(no removal of the first/last space of the inline, or before it).
>>
>>On top of that, it does away with the need to chain together all
>>FOText instances of a whole block (thus making that ugly static
>>'lastFOTextProcessed' obsolete?)
Chris
Re: fo:marker and white space
Posted by Manuel Mall <mm...@arcus.com.au>.
On Thu, 17 Nov 2005 03:40 am, Simon Pepping wrote:
> On Wed, Nov 16, 2005 at 08:15:47AM +0800, Manuel Mall wrote:
<snip/>
> linefeed-treatment is a local operation on a single character.
>
Yes
> white-space-collapse does not cross FO boundaries because the spec
> limits this to sibling character FOs.
>
Yes, but
 <fo:character character=" ">
are fo character siblings in the XSL-FO sense but not fop internally.
The suggestion to move white space handling to inline will not cover
this case.
> Only white-space-treatment extends beyond FO boundaries, but its
> treatment in handleWhitespace is only the first stage. At line
> building it needs to be revisited.
>
> That means that theoretically handleWhitespace can be done within
> each FO.
Not quite - see above - because fops internal distinction between text
and fo:character.
> But practically it may be better to wait until we have
> settled ideas about this stage, scanning for linebreak opportunities
> and gathering of Knuth elements.
>
I am still of the opinion we are better off to do as much white space
handling, including white-space-treatment during refinement. Only
white-space-treatment around "soft breaks" need to be deferred to the
line breaking phase during layout.
> Simon
Manuel
Re: fo:marker and white space
Posted by Simon Pepping <sp...@leverkruid.nl>.
On Wed, Nov 16, 2005 at 08:15:47AM +0800, Manuel Mall wrote:
> I have no problems with the suggestion to move the white space handling
> from Block into its own class so other fo's that need it can make use
> of it.
>
> However, I still need to be convinced that pushing it down to inline
> level is actually of benefit. I am afraid we will end up with the same
> problem we now have at LM level, that is text for a paragraph needs to
> be analysed across fo boundaries and the current LM structures are very
> much in the way of doing that. Whitespace needs to be handled across fo
> boundaries as well. The current iterator structure was designed to
> exactly facilitate that. It seems to be doing it well and I see no
> reason to replace it.
linefeed-treatment is a local operation on a single character.
white-space-collapse does not cross FO boundaries because the spec
limits this to sibling character FOs.
Only white-space-treatment extends beyond FO boundaries, but its
treatment in handleWhitespace is only the first stage. At line
building it needs to be revisited.
That means that theoretically handleWhitespace can be done within each
FO. But practically it may be better to wait until we have settled
ideas about this stage, scanning for linebreak opportunities and
gathering of Knuth elements.
Simon
--
Simon Pepping
home page: http://www.leverkruid.nl
Re: fo:marker and white space
Posted by Andreas L Delmelle <a_...@pandora.be>.
(Sorry for the delayed reply...)
On Nov 16, 2005, at 01:15, Manuel Mall wrote:
> On Wed, 16 Nov 2005 03:45 am, Jeremias Maerki wrote:
>> Sounds like a good plan to me. Would you go after that?
>>
Sure thing. For now, I'll restrict it to moving handleWhitespace()
into a separate class, maybe one instance for each Flow/StaticContent
or PageSequence? That instance can then be used by all blocks and
markers, carrying state info down the tree. FTM, leaving the iterator
structure unchanged.
> I have no problems with the suggestion to move the white space
> handling
> from Block into its own class so other fo's that need it can make use
> of it.
>
> However, I still need to be convinced that pushing it down to inline
> level is actually of benefit.
Maybe it's something aesthetic, I dunno. In theory, whitespace
handling could already be started from the point where you reach the
first nested start-inline event, why wait until the first start-
block? A choice... and so we are forced to recurse because previous
child nodes could contain text themselves.
It simply seems more 'natural' to have each FO handle its own
whitespace, so the higher level FOs only need to see the first/last
characters of any child inline nested between their own FOText-nodes.
BTW: what is the maximum number of characters you need in a sequence
before you can be certain whether a given whitespace should be
removed/converted? The current implementation seems to indicate that
number to be two or three. True enough, that's purely XML whitespace
handling...
But why wait to begin processing until you have, maybe a few hundred
characters? Since *all* whitespace is passed through by the parser,
IMO the sooner you can throw excess space characters away, the
better. Even more so if it's excess fo:character objects.
> I am afraid we will end up with the same
> problem we now have at LM level, that is text for a paragraph needs to
> be analysed across fo boundaries and the current LM structures are
> very
> much in the way of doing that.
1) Agreed that the LayoutManagers definitely may need more context
than a handful of characters to make sound decisions. Looking for
line-breaks, now there we really need to look across FO boundaries
2) There is no inherent contradiction between handling whitespace at
each block/inline level, and handling whitespace across FO
boundaries. The latter refers more to the net result of the whole
algorithm.
--and so, I wonder...
> Whitespace needs to be handled across fo
> boundaries as well. The current iterator structure was designed to
> exactly facilitate that. It seems to be doing it well and I see no
> reason to replace it.
...Hmm. Are the iterators themselves used by layout? AFAICS, that's a
No. Maybe they are in the wrong package ATM? ;-)
IMO, the current iterator structure, in combination with chaining all
those FOText instances together, is something that does need to be
revisited (as in: definitely). Not for an alpha-release, but some
comments in FOText clearly indicate that it was never the intention
of keeping it that way. If I get the timeline correctly, the current
FOText design predates the separation of layout-logic. In the LM-
tree, a BlockLM needs to be able to see all text-nodes of its
descendants, but I don't immediately see a reason why a Block in the
FOTree needs to.
To merge in another part of the thread:
On Nov 17, 2005, at 00:28, Manuel Mall wrote:
> On Thu, 17 Nov 2005 03:40 am, Simon Pepping wrote:
>>
>> linefeed-treatment is a local operation on a single character.
>>
> Yes
>
>> white-space-collapse does not cross FO boundaries because the spec
>> limits this to sibling character FOs.
>>
> Yes, but
>
>  <fo:character character=" ">
>
> are fo character siblings in the XSL-FO sense but not fop internally.
> The suggestion to move white space handling to inline will not cover
> this case.
Not in itself, but it would make it simpler to delete the character
node if it is done when processing its parent than it is when doing
so when processing an ancestor X levels up.
WRT whitespace handling during refinement, inlines have more in
common with markers and blocks than with text-nodes, and fo:character
is more to be treated like a text-node than an inline.
Cheers,
Andreas
Re: fo:marker and white space
Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 16 Nov 2005 03:45 am, Jeremias Maerki wrote:
> Sounds like a good plan to me. Would you go after that?
>
I have no problems with the suggestion to move the white space handling
from Block into its own class so other fo's that need it can make use
of it.
However, I still need to be convinced that pushing it down to inline
level is actually of benefit. I am afraid we will end up with the same
problem we now have at LM level, that is text for a paragraph needs to
be analysed across fo boundaries and the current LM structures are very
much in the way of doing that. Whitespace needs to be handled across fo
boundaries as well. The current iterator structure was designed to
exactly facilitate that. It seems to be doing it well and I see no
reason to replace it.
Manuel
> On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> > On Nov 15, 2005, at 10:03, Jeremias Maerki wrote:
> >
> > <snip />
> >
> > > The fix is probably to extract handleWhitespace
> > > from Block into a separate class and call it from Block and
> > > Marker.
> >
> > In this respect: I still wonder whether it wouldn't be more
> > convenient to split up the whitespace handling, and deal with the
> > inlines separately. Currently, InlineCharIterator needs to generate
> > boundary characters to indicate start- or end-inline. If we would
> > deal with the whitespace of the inlines at inline-level itself, it
> > should become far more straightforward to apply the 'special' rules
> > (no removal of the first/last space of the inline, or before it).
> >
> > On top of that, it does away with the need to chain together all
> > FOText instances of a whole block (thus making that ugly static
> > 'lastFOTextProcessed' obsolete?)
> >
> > Extracting handleWhitespace() into a separate class would, in any
> > case, be A Good Thing.
> >
> > My 2 cents.
> >
> > Cheers,
> >
> > Andreas
>
> Jeremias Maerki
Re: fo:marker and white space
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Sounds like a good plan to me. Would you go after that?
On 15.11.2005 18:06:13 Andreas L Delmelle wrote:
> On Nov 15, 2005, at 10:03, Jeremias Maerki wrote:
>
> <snip />
> > The fix is probably to extract handleWhitespace
> > from Block into a separate class and call it from Block and Marker.
>
> In this respect: I still wonder whether it wouldn't be more
> convenient to split up the whitespace handling, and deal with the
> inlines separately. Currently, InlineCharIterator needs to generate
> boundary characters to indicate start- or end-inline. If we would
> deal with the whitespace of the inlines at inline-level itself, it
> should become far more straightforward to apply the 'special' rules
> (no removal of the first/last space of the inline, or before it).
>
> On top of that, it does away with the need to chain together all
> FOText instances of a whole block (thus making that ugly static
> 'lastFOTextProcessed' obsolete?)
>
> Extracting handleWhitespace() into a separate class would, in any
> case, be A Good Thing.
>
> My 2 cents.
>
> Cheers,
>
> Andreas
Jeremias Maerki
Re: fo:marker and white space
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Nov 15, 2005, at 10:03, Jeremias Maerki wrote:
<snip />
> The fix is probably to extract handleWhitespace
> from Block into a separate class and call it from Block and Marker.
In this respect: I still wonder whether it wouldn't be more
convenient to split up the whitespace handling, and deal with the
inlines separately. Currently, InlineCharIterator needs to generate
boundary characters to indicate start- or end-inline. If we would
deal with the whitespace of the inlines at inline-level itself, it
should become far more straightforward to apply the 'special' rules
(no removal of the first/last space of the inline, or before it).
On top of that, it does away with the need to chain together all
FOText instances of a whole block (thus making that ugly static
'lastFOTextProcessed' obsolete?)
Extracting handleWhitespace() into a separate class would, in any
case, be A Good Thing.
My 2 cents.
Cheers,
Andreas
Re: fo:marker and white space
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Debugging shows:
The FOText instances under fo:marker (just before and after the fo:block)
don't get processed for whitespace treatment. Block.handleWhitespace
isn't accessible to it. That's why the whitespace isn't removed and
causes additional lines. The fix is probably to extract handleWhitespace
from Block into a separate class and call it from Block and Marker.
So, this is a bug, to be fixed after the release I guess.
On 15.11.2005 05:56:19 Manuel Mall wrote:
> I was looking at clipping warnings generated by
> examples/fo/markers/hide.fo when I noticed that white space around
> fo:marker seems significant with respect to the output generated when
> the marker is retrieved, e.g.:
>
> <fo:marker>
> <fo:block>
> some text
> </fo:block>
> </fo:marker>
>
> when retrieved produces:
>
> <empty line>
> some text
> <empty line>
>
> while:
>
> <fo:marker><fo:block>some text</fo:block></fo:marker>
>
> just generates:
>
> some text
>
> I am suspicious that this is wrong and both inputs should produce the
> same output.
>
> For a test case and its output see:
> http://people.apache.org/~manuel/fop/marker_test.xml
> http://people.apache.org/~manuel/fop/marker_test.pdf
>
> Manuel
Jeremias Maerki