You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2005/12/30 14:38:07 UTC

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

On Dec 30, 2005, at 14:33, adelmelle@apache.org wrote:

> Author: adelmelle
> Date: Fri Dec 30 05:33:18 2005
> New Revision: 360083
>
> URL: http://svn.apache.org/viewcvs?rev=360083&view=rev
> Log:
> Revision of refinement white-space handling (cfr. Bugzilla 37639)

Brief description of the changes:
1) extraction of the handleWhiteSpace() method into a separate class  
(org.apache.fop.fo.XMLWhiteSpaceHandler)
2) altering the algorithm to trigger white-space handling for all FOs  
that can contain FOText or Character children (= all FObjMixed, with  
the notable exception of Leader), at two points:
   * addChildNode()
   * endOfNode()

Case not covered by the altered code (but I didn't think it to be a  
showstopper):

If you have:
   <fo:block>
     <fo:inline>some inline text _
____</fo:inline>_
__</fo:block>


Currently, the first series of underlined white-space is not  
completely suppressed. It will at most be collapsed to a single  
space. The second series --between endInline() and endBlock()-- is  
completely suppressed because handleWhiteSpace() was called from  
Block.endOfNode().

I explicitly excluded fo:leaders from white-space handling, because  
for example:

<fo:leader leader-pattern="use-content">   xxx   </fo:leader>

Collapsing the three spaces to one may produce unintended results.

OTOH, if you have a nested inline in a leader, then the white-space  
for the inline will be treated...

For the rest only a few minor updates to related test-cases:
- block_white-space-collapse_2.xml: see info disabled-testcases.xml
- leader_text-align.xml / leader_toc.xml: update of the expected ipd  
values; they seemed to ignore preserved spaces

Things left to consider:
* the Iterator structure: as it stands, we only need to iterate over  
FOText and Character, so it seems like there is a possibility for  
radical simplification in that area.
* using a similar strategy for text-transform, so we can get rid of  
the static FOText.lastFOTextProcessed


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Sun, 1 Jan 2006 12:02 am, Andreas L Delmelle wrote:
> On Dec 31, 2005, at 16:05, Manuel Mall wrote:
>
> The possible problem I saw with the block-level white-space handling
> was that all white-space characters would continue to take up memory
> until the first nested block or in the worst case, until the end-of-
> block. In case of large blocks with lots of indents due to pretty-
> printing, the current approach makes these spaces disappear much
> sooner (= more memory-efficient).
>
Andreas,

you can't be serious here. Keeping a few whitespace characters until the 
end of a block is reached is completely irrelevant with respect to FOPs 
memory consumption and should not play even the slightest consideration 
when comes to choice of algorithm. If this is the only reason which 
stops you from doing end of paragraph line-feed-treatment during 
refinement then please revise the algorithm to do so.

<snip/>
>
> Cheers,
>
> Andreas

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
That proves the point that I shouldn't meddle in things I don't fully
understand, yet, and don't have enough time to really get to know.
Lesson learnt.

On 04.01.2006 13:10:42 Manuel Mall wrote:
<snip/>
> 1. The patch is not solving the whitespace handling problem for markers 
> which was one of its initial drivers. We can blame Jeremias here - just 
> to drag in another innocent party :-) - as he suggested factoring out 
> the fo:block specific whitespace refinement so it can be applied to 
> markers. Unfortunately that was a bad idea.
<snip/>

Jeremias Maerki


Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Fri, 6 Jan 2006 04:56 am, Andreas L Delmelle wrote:
> On Jan 5, 2006, at 18:48, Andreas L Delmelle wrote:
>
> <snip />
>
> To summarize this thread (it has taken long enough :-))
>
> I thought it over a bit more, and what I'm currently working on (and
> will most likely finish during the weekend) is the following:
>
> 1) Basically keep the algorithm the way I recently altered it, but
> containing some additional processing for trailing inline FOs that
> end with a sequence of white-space. Determining this last bit is easy
> enough, since it just means that XMLWhiteSpaceHandler.inWhiteSpace
> will be false after handleWhiteSpace(). At the end of the block, we
> will do one more pass over all those trailing inlines, if any.
> IMO, in the vast majority of use-cases there will be either zero, one
> or at most two of those, but theoretically this could be any
> number... If there are any, then if white-space-collapse has the
> default value of "true" there will be only one trailing white-space
> character left at that point, so this additional bit of processing
> will cost virtually nothing.
>
> 2) Simplify the CharIterator structure, in the sense that we'll still
> only need an iterator over FOText and Characters. Unless layout needs
> access to the iterators, I think charIterator() can be pushed down to
> be specific to FObjMixed, and then the overrides of this method can
> be removed from all other FOs apart from FOText and Character. For
> 1), it could turn out handy if I add the possibility to iterate
> backwards until the last non-white-space is encountered...
>
> 3) Exclude markers (and their descendants) from white-space handling
> during refinement, for the mentioned reasons:
>    * retrieve-marker's ancestor's white-space properties govern the
> treatment in this case
>    * possibly page-break context is needed when dealing with
> alternating static-contents
>    * retrieve-markers with retrieve-boundary="document"
>
> 3) of course means the recently enabled marker_bug.xml testcase will
> have to be disabled again until we find a way to tackle this in
> layout. I had thought of using XMLWhiteSpaceHandler itself for this,
> but the tricky part is that, once a Marker (and its descendants) have
> been white-space-treated, the stripped white-space is permanently
> gone, and since that same Marker can again be retrieved in a
> different context etc.
>
> [end-of-thread, I hope ;-)]
>

Thanks for the summary and yes I think we are at the end of this one.

Personally I would not do 3) at this point in time, that is I would not 
exclude markers from the whitespace refinement. IMO the whitespace 
handling properties will have their default values (or matching values 
in the marker and retrieve-marker contexts) most of the time and 
therefore the current handling produces better results more often than 
by reverting that part of the patch. But this is a judgement call and I 
am not really fussed. There is a testcase which shows how it fails when 
the properties are not matching and this should suffice to document the 
problem.

> Cheers,
>
> Andreas

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Chris Bowditch <bo...@hotmail.com>.
Andreas L Delmelle wrote:

<snip what="excellent summary"/>

> 
> [end-of-thread, I hope ;-)]

Thanks for writing this summary Andreas. I for one, am a lot clearer on 
this now, and in full agreement with your proposed course of action.

Thanks,

Chris



Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 5, 2006, at 18:48, Andreas L Delmelle wrote:

<snip />

To summarize this thread (it has taken long enough :-))

I thought it over a bit more, and what I'm currently working on (and  
will most likely finish during the weekend) is the following:

1) Basically keep the algorithm the way I recently altered it, but  
containing some additional processing for trailing inline FOs that  
end with a sequence of white-space. Determining this last bit is easy  
enough, since it just means that XMLWhiteSpaceHandler.inWhiteSpace  
will be false after handleWhiteSpace(). At the end of the block, we  
will do one more pass over all those trailing inlines, if any.
IMO, in the vast majority of use-cases there will be either zero, one  
or at most two of those, but theoretically this could be any  
number... If there are any, then if white-space-collapse has the  
default value of "true" there will be only one trailing white-space  
character left at that point, so this additional bit of processing  
will cost virtually nothing.

2) Simplify the CharIterator structure, in the sense that we'll still  
only need an iterator over FOText and Characters. Unless layout needs  
access to the iterators, I think charIterator() can be pushed down to  
be specific to FObjMixed, and then the overrides of this method can  
be removed from all other FOs apart from FOText and Character. For  
1), it could turn out handy if I add the possibility to iterate  
backwards until the last non-white-space is encountered...

3) Exclude markers (and their descendants) from white-space handling  
during refinement, for the mentioned reasons:
   * retrieve-marker's ancestor's white-space properties govern the  
treatment in this case
   * possibly page-break context is needed when dealing with  
alternating static-contents
   * retrieve-markers with retrieve-boundary="document"

3) of course means the recently enabled marker_bug.xml testcase will  
have to be disabled again until we find a way to tackle this in  
layout. I had thought of using XMLWhiteSpaceHandler itself for this,  
but the tricky part is that, once a Marker (and its descendants) have  
been white-space-treated, the stripped white-space is permanently  
gone, and since that same Marker can again be retrieved in a  
different context etc.

[end-of-thread, I hope ;-)]

Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 5, 2006, at 10:02, Chris Bowditch wrote:

> Andreas L Delmelle wrote:
>
>> I see a remote possibility to exclude the markers whose class- 
>> name  corresponds to at least one retrieve-marker that has an  
>> ancestor with  non-default white-space-treatment/-collapse. If no  
>> such retrieve- marker exists, the white-space can be collapsed  
>> during refinement.  All possible retrieve-markers in a page- 
>> sequence will, in any case,  always be available at the point  
>> where a given marker is processed  (and through them, also their  
>> ancestor-block's white-space related  props). I'll see what I can  
>> do about this ASAP, although I'm not sure  whether this will gain  
>> us much. The FOs are readily available, but  they need to be  
>> reached all the same.
>
> Now I'm not sure I follow your thinking here. How will you find  
> retrieve-markers from a marker FO when retrieve- 
> boundary="document" ???

'remote', I said, and too remote it seems. Thanks for pointing this  
out! If not, I'd probably have spent a few hours before bumping into  
this particular restriction...


Cheers,

Andreas


Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Chris Bowditch <bo...@hotmail.com>.
Andreas L Delmelle wrote:

> On Jan 4, 2006, at 13:10, Manuel Mall wrote:
>  

<snip/>

> Ouch! This was one thing I indeed completely lost track of: the  
> properties governing white-space-treatment and the like for the  
> corresponding retrieve-marker... To add to all the fun, there is  indeed 
> no way at all to solve this during refinement stage in a  generic way, 
> taking into account alternating static-contents (page- break context is 
> needed for this).

This is a tricky problem to solve.

<snip/>

> 
> To be on the safe side, it seems better if I revert at least partly.
> I think extracting the handleWhiteSpace() method into a separate  class 
> is still a good idea, even if only to avoid code-duplication  and to 
> have all the related logic together in one spot --no need to  blame 
> Jeremias for this thought :-)
> Combine this with the previous approach using the  
> RecursiveCharIterators. I haven't removed much of that code anyway,  
> didn't even rename the classes just yet, while they are currently  never 
> used recursively (=only deal with FOText and Characters).

Agreed

> I see a remote possibility to exclude the markers whose class-name  
> corresponds to at least one retrieve-marker that has an ancestor with  
> non-default white-space-treatment/-collapse. If no such retrieve- marker 
> exists, the white-space can be collapsed during refinement.  All 
> possible retrieve-markers in a page-sequence will, in any case,  always 
> be available at the point where a given marker is processed  (and 
> through them, also their ancestor-block's white-space related  props). 
> I'll see what I can do about this ASAP, although I'm not sure  whether 
> this will gain us much. The FOs are readily available, but  they need to 
> be reached all the same.

Now I'm not sure I follow your thinking here. How will you find 
retrieve-markers from a marker FO when retrieve-boundary="document" ???

Chris



Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <ma...@apache.org>.
> On Jan 4, 2006, at 13:10, Manuel Mall wrote:
>
<snip />
>
>> I am not quite sure what to recommend from here. May be along the
>> following lines:
>>
>> 1. Leave the current status quo including leave Andreas patch in the
>> system. At least it covers the most common scenario - whitespace
>> should
>> be removed for markers. Although it does it in the wrong place but we
>> don't have anything better yet.
>
> To be on the safe side, it seems better if I revert at least partly.
> I think extracting the handleWhiteSpace() method into a separate
> class is still a good idea, even if only to avoid code-duplication
> and to have all the related logic together in one spot --no need to
> blame Jeremias for this thought :-)
> Combine this with the previous approach using the
> RecursiveCharIterators. I haven't removed much of that code anyway,
> didn't even rename the classes just yet, while they are currently
> never used recursively (=only deal with FOText and Characters).
> I see a remote possibility to exclude the markers whose class-name
> corresponds to at least one retrieve-marker that has an ancestor with
> non-default white-space-treatment/-collapse. If no such retrieve-
> marker exists, the white-space can be collapsed during refinement.
> All possible retrieve-markers in a page-sequence will, in any case,
> always be available at the point where a given marker is processed
> (and through them, also their ancestor-block's white-space related
> props). I'll see what I can do about this ASAP, although I'm not sure
> whether this will gain us much. The FOs are readily available, but
> they need to be reached all the same.
>

Thanks Andreas, I'll be happy this with course of action.

>
> Cheers,
>
> Andreas
>

Manuel


Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 4, 2006, at 13:10, Manuel Mall wrote:

> I think I have bad news for all who weighed into this debate.
>
> It now appears to me that there was a very good reason for the  
> original
> version for the whitespace refinement algorithm not being run on
> markers. For markers refinement cannot be done in the context of the
> fo:marker as the actual property values (in this case the values for
> the white-space / linefeed related properties) can only be determined
> in the context of the fo:retrieve-marker.
<snip />

Ouch! This was one thing I indeed completely lost track of: the  
properties governing white-space-treatment and the like for the  
corresponding retrieve-marker... To add to all the fun, there is  
indeed no way at all to solve this during refinement stage in a  
generic way, taking into account alternating static-contents (page- 
break context is needed for this).

<snip />
> white-space should NOT be removed but Andreas change now does  
> remove it.

...which is indeed only allowed in case of default values for those  
props on the retrieve-marker. A bit too enthusiastic of me.

<snip />

> I am not quite sure what to recommend from here. May be along the
> following lines:
>
> 1. Leave the current status quo including leave Andreas patch in the
> system. At least it covers the most common scenario - whitespace  
> should
> be removed for markers. Although it does it in the wrong place but we
> don't have anything better yet.

To be on the safe side, it seems better if I revert at least partly.
I think extracting the handleWhiteSpace() method into a separate  
class is still a good idea, even if only to avoid code-duplication  
and to have all the related logic together in one spot --no need to  
blame Jeremias for this thought :-)
Combine this with the previous approach using the  
RecursiveCharIterators. I haven't removed much of that code anyway,  
didn't even rename the classes just yet, while they are currently  
never used recursively (=only deal with FOText and Characters).
I see a remote possibility to exclude the markers whose class-name  
corresponds to at least one retrieve-marker that has an ancestor with  
non-default white-space-treatment/-collapse. If no such retrieve- 
marker exists, the white-space can be collapsed during refinement.  
All possible retrieve-markers in a page-sequence will, in any case,  
always be available at the point where a given marker is processed  
(and through them, also their ancestor-block's white-space related  
props). I'll see what I can do about this ASAP, although I'm not sure  
whether this will gain us much. The FOs are readily available, but  
they need to be reached all the same.


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 4 Jan 2006 08:26 am, Manuel Mall wrote:
> On Wed, 4 Jan 2006 03:51 am, Andreas L Delmelle wrote:
> > On Jan 2, 2006, at 06:27, Manuel Mall wrote:
> > > On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:

> > BTW: there is another gap that isn't completely covered by my
> > alterations. Markers are always white-space-treated as inlines,
> > which would lead to incorrect results if a marker is retrieved in a
> > context like
> >
> > <fo:block><fo:retrieve-marker .../></fo:block>
> >
> > As I see it, this means that something like what I described above
> > will need to be considered for this case as well. If the marker is
> > retrieved as a child of an fo:inline, the currently produced result
> > will be correct.
> >
> > Since authors are allowed to have static-contents that retrieve the
> > same marker twice, once as child of a block and another as a child
> > of an inline, we can't possibly decide at FOTree stage if these
> > spaces may be removed.
>
> This is a very interesting point you are making here. I need to look
> into that a bit more.
>

I think I have bad news for all who weighed into this debate.

It now appears to me that there was a very good reason for the original 
version for the whitespace refinement algorithm not being run on 
markers. For markers refinement cannot be done in the context of the 
fo:marker as the actual property values (in this case the values for 
the white-space / linefeed related properties) can only be determined 
in the context of the fo:retrieve-marker. In this example:

<fo:block background-color="yellow" white-space-collapse="false">
     <fo:retrieve-marker retrieve-class-name="m1" />
</fo:block>
...
<fo:marker marker-class-name="m1">
   <fo:block>
       First   marker  with  whitespace  around
   </fo:block>
</fo:marker>

white-space should NOT be removed but Andreas change now does remove it.

There have been endless discussions on property inheritance in the 
context of markers on this list before and even this issue was raised 
before: http://marc.theaimsgroup.com/?l=fop-dev&m=110254108019344&w=2.

Where does this leave us?

1. The patch is not solving the whitespace handling problem for markers 
which was one of its initial drivers. We can blame Jeremias here - just 
to drag in another innocent party :-) - as he suggested factoring out 
the fo:block specific whitespace refinement so it can be applied to 
markers. Unfortunately that was a bad idea.

2. Because of the marker issue we need to have whitespace handling in 
layout before or as part of the Knuth element generation.

I am not quite sure what to recommend from here. May be along the 
following lines:

1. Leave the current status quo including leave Andreas patch in the 
system. At least it covers the most common scenario - whitespace should 
be removed for markers. Although it does it in the wrong place but we 
don't have anything better yet.

2. Add a testcase which shows the incorrect whitespace handling for 
markers so we have a record of this. I can do that as I have basically 
written a testcase as part of this investigation.

3. Put some effort into the Knuth element generation for line building 
area as this is all interrelated:
	whitespace handling
	UAX#14 line breaking
	Handling of unicode spaces, zwsp, etc

<snip/>
> >
> > Cheers,
> >
> > Andreas
>

Regards

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Chris Bowditch <bo...@hotmail.com>.
Manuel Mall wrote:
> On Wed, 4 Jan 2006 03:51 am, Andreas L Delmelle wrote:
>>

Sorry to interject into this debate, but I have to say that I agree with 
Manuel and thought I'd better speak up as this debate doesn't appear to 
be making any progress.

Thanks for trying to improve this important area of the code Andreas, I 
don't want to appear ungrateful for your efforts, it's just I have 
similar concerns to Manuel.

>>To sum it up:
>>Our implementation of Donald Knuth's algorithm first creates the
>>element lists for the FOs, and then from those lists it calculates
>>the most favorable break-positions. Subsequently, it adds the areas
>>based on those breaks to the block-area, right?
>>Now, what I mean:
>>If the element-lists for the trailing spaces(*) are modeled
>>appropriately, and we add a forced break (infinite penalty) for the
>>end-of-block, then the algorithm will always create one final pseudo-
>>line-break(**) where those spaces are dissolved if present, just as
>>they would be when it were the first line. The generated pseudo-line
>>(s) will have no content at all. Maybe a minor tweak needed in
>>LineArea to return zero BPD when it has no child-areas, and there we
>>go... In Block.addChildArea, we can then test for zero-BPD line-areas
>>to keep them from effectively being added to the block.
>>
>>Something like that? Or am I still missing important implications?
>>

I think the important point is that the Knuth algorithm cannot be made 
to strip trailing spaces. Only by placing hacky code around the 
algorithm can this effect been achieved. Code which from my perspective 
has caused a lot of bugs and unwanted side effects. Bugs which Jeremias 
and Manuel seem to be constantly fixing in this area. So I think leading 
and trailing space removal should be kept in the refinement (FO Tree) 
stage for this reason.

Also, as Manuel pointed out, the Knuth algorithm does not handle cross 
LM space removal. Something which can be achieved more easily in the FO 
Tree.

<snip/>

Chris



Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 4 Jan 2006 03:51 am, Andreas L Delmelle wrote:
> On Jan 2, 2006, at 06:27, Manuel Mall wrote:
> > On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
> >> Would it not be a much easier and much
> >> more straightforward solution to have every paragraph end with an
> >> infinitely low penalty, so that the algorithm eventually treats
> >> trailing spaces in the last line-area just the same as it would
> >> for 'normal' line-breaks?
> >
> > No, leading and trailing paragraph spaces must be removed BEFORE
> > linebreaking, that is before we get into the Knuth stuff otherwise
> > they
> > may be incorrectly considered as part of the linebreaking line
> > length and adjustment calculations. Therefore when this was done
> > during refinement at the block level it was just the right place
> > IMO. Obviously spaces around formatter generated linebreaks must be
> > dealt with during linebreaking.
>
> Hmm... Yes, yes. We are growing closer. I think I like you. Well,
> actually, I'm growing a bit tired of this debate, but that's a Very
> Good Sign, if you catch the drift. :-)
>
> To sum it up:
> Our implementation of Donald Knuth's algorithm first creates the
> element lists for the FOs, and then from those lists it calculates
> the most favorable break-positions. Subsequently, it adds the areas
> based on those breaks to the block-area, right?
> Now, what I mean:
> If the element-lists for the trailing spaces(*) are modeled
> appropriately, and we add a forced break (infinite penalty) for the
> end-of-block, then the algorithm will always create one final pseudo-
> line-break(**) where those spaces are dissolved if present, just as
> they would be when it were the first line. The generated pseudo-line
> (s) will have no content at all. Maybe a minor tweak needed in
> LineArea to return zero BPD when it has no child-areas, and there we
> go... In Block.addChildArea, we can then test for zero-BPD line-areas
> to keep them from effectively being added to the block.
>
> Something like that? Or am I still missing important implications?
>

The point you are missing is that the Knuth algorithm only deletes 
leading spaces in a line because it always breaks at the first of a 
sequence of spaces. Therefore adding an infinite penalty at the end of 
the paragraph doesn't achieve anything with respect to space removal. 
And BTW we do add an infinite penalty at the end of a paragraph 
already.

> (*) this made me wonder BTW in how many percent of the cases an
> fo:inline with a trailing space would actually end an fo:block.
> Anyone care to make an educated guess?
>
> (**) more than one in the very exceptional case where the trailing
> spaces would cause a line-break themselves, i.e. if there is just
> enough IPD left for one space, and we have more than one... but that
> would mean nested-nested-...-nested trailing fo:inlines, or one
> fo:inline with lots of non-collapsed spaces.
>

Not sure if this consideration is relevant.

> <snip />
>
> > That is not the point at all. The previous algorithm was defective
> > in the sense of not dealing with whitespace around markers and
> > possibly other fo's with text content.
>
> OK, so it is an improvement after all.
> Phew, <wipes forehead />, I almost thought I had become utterly
> useless... :-)
>
> > The task at hand was to extend the whitespace handling to other
> > fo's which were previously omitted, e.g. markers. Your change does
> > that however, it does not preserve the existing functionality.
> > Therefore its
> > progress in one sense and regression in another. What I am asking
> > you to do is to look for a solution were we don't have any
> > regressions and still get the whitespace handling applied to other
> > fos.
>
> See my above description: it can be done with much less effort IIC,
> both efficiency- and code-wise, if this particular step is left to
> the layout algorithm.

That's were we disagree - we had a simple working solution before your 
patch - I like to have that back. Putting it into layout is a non 
trivial exercise because it requires "cross fo/lm border" processing. 
This is something layout currently doesn't do but the whitespace 
routine at fo level before your patch did do. That's why I like it so 
much :-).

>
> BTW: there is another gap that isn't completely covered by my
> alterations. Markers are always white-space-treated as inlines, which
> would lead to incorrect results if a marker is retrieved in a context
> like
>
> <fo:block><fo:retrieve-marker .../></fo:block>
>
> As I see it, this means that something like what I described above
> will need to be considered for this case as well. If the marker is
> retrieved as a child of an fo:inline, the currently produced result
> will be correct.
>
> Since authors are allowed to have static-contents that retrieve the
> same marker twice, once as child of a block and another as a child of
> an inline, we can't possibly decide at FOTree stage if these spaces
> may be removed.
>

This is a very interesting point you are making here. I need to look 
into that a bit more.

> > BTW, if you had mentioned the regression in your patch description
> > I would have raised my objections at that time. You only mentioned
> > it after you applied the patch.
>
> True enough, I hadn't considered that. No harm intended and none
> taken, I hope...

Of course not.

>
> Anyway, up to here, this has yet again been a very stimulating
> discussion. Thanks for insisting on my reconsidering and rephrasing
> of ideas. At the start, I only *sensed* it was possible and desirable
> to move this to layout. Now I'm certain that it is not only possible,
> but also mandatory to do so, if we want to cover virtually all cases.
>
>
> Cheers,
>
> Andreas

Regards

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 2, 2006, at 06:27, Manuel Mall wrote:

> On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
>> Would it not be a much easier and much
>> more straightforward solution to have every paragraph end with an
>> infinitely low penalty, so that the algorithm eventually treats
>> trailing spaces in the last line-area just the same as it would for
>> 'normal' line-breaks?
>
> No, leading and trailing paragraph spaces must be removed BEFORE
> linebreaking, that is before we get into the Knuth stuff otherwise  
> they
> may be incorrectly considered as part of the linebreaking line length
> and adjustment calculations. Therefore when this was done during
> refinement at the block level it was just the right place IMO.
> Obviously spaces around formatter generated linebreaks must be dealt
> with during linebreaking.

Hmm... Yes, yes. We are growing closer. I think I like you. Well,  
actually, I'm growing a bit tired of this debate, but that's a Very  
Good Sign, if you catch the drift. :-)

To sum it up:
Our implementation of Donald Knuth's algorithm first creates the  
element lists for the FOs, and then from those lists it calculates  
the most favorable break-positions. Subsequently, it adds the areas  
based on those breaks to the block-area, right?
Now, what I mean:
If the element-lists for the trailing spaces(*) are modeled  
appropriately, and we add a forced break (infinite penalty) for the  
end-of-block, then the algorithm will always create one final pseudo- 
line-break(**) where those spaces are dissolved if present, just as  
they would be when it were the first line. The generated pseudo-line 
(s) will have no content at all. Maybe a minor tweak needed in  
LineArea to return zero BPD when it has no child-areas, and there we  
go... In Block.addChildArea, we can then test for zero-BPD line-areas  
to keep them from effectively being added to the block.

Something like that? Or am I still missing important implications?

(*) this made me wonder BTW in how many percent of the cases an  
fo:inline with a trailing space would actually end an fo:block.  
Anyone care to make an educated guess?

(**) more than one in the very exceptional case where the trailing  
spaces would cause a line-break themselves, i.e. if there is just  
enough IPD left for one space, and we have more than one... but that  
would mean nested-nested-...-nested trailing fo:inlines, or one  
fo:inline with lots of non-collapsed spaces.

<snip />
> That is not the point at all. The previous algorithm was defective in
> the sense of not dealing with whitespace around markers and possibly
> other fo's with text content.

OK, so it is an improvement after all.
Phew, <wipes forehead />, I almost thought I had become utterly  
useless... :-)

> The task at hand was to extend the whitespace handling to other fo's
> which were previously omitted, e.g. markers. Your change does that
> however, it does not preserve the existing functionality. Therefore  
> its
> progress in one sense and regression in another. What I am asking you
> to do is to look for a solution were we don't have any regressions and
> still get the whitespace handling applied to other fos.

See my above description: it can be done with much less effort IIC,  
both efficiency- and code-wise, if this particular step is left to  
the layout algorithm.

BTW: there is another gap that isn't completely covered by my  
alterations. Markers are always white-space-treated as inlines, which  
would lead to incorrect results if a marker is retrieved in a context  
like

<fo:block><fo:retrieve-marker .../></fo:block>

As I see it, this means that something like what I described above  
will need to be considered for this case as well. If the marker is  
retrieved as a child of an fo:inline, the currently produced result  
will be correct.

Since authors are allowed to have static-contents that retrieve the  
same marker twice, once as child of a block and another as a child of  
an inline, we can't possibly decide at FOTree stage if these spaces  
may be removed.

> BTW, if you had mentioned the regression in your patch description I
> would have raised my objections at that time. You only mentioned it
> after you applied the patch.

True enough, I hadn't considered that. No harm intended and none  
taken, I hope...

Anyway, up to here, this has yet again been a very stimulating  
discussion. Thanks for insisting on my reconsidering and rephrasing  
of ideas. At the start, I only *sensed* it was possible and desirable  
to move this to layout. Now I'm certain that it is not only possible,  
but also mandatory to do so, if we want to cover virtually all cases.


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
> On Jan 1, 2006, at 17:15, Manuel Mall wrote:
> > The Knuth algorithm (read the paper) deals only with box/pen/glue
> > for the purpose of breaking lines and if it breaks a line it takes
> > certain actions with respect to discarding pen/glue elements
> > directly following
> > the break it created. If it doesn't create a line break it leaves
> > everything as it is. This means everything at the beginning and end
> > of a paragraph is left untouched. line-feed-treatment at the
> > beginning and
> > end of a paragraph is not influenced by the Knuth algorithm and
> > therefore cannot be controlled by whatever sequences we generate.
>
> Ahem... I do get your point, but the fact of the matter remains that
> the trailing spaces should be removed for the reason that they would
> end up at the end of a *line-area*, not because they end up at the
> end of the *paragraph*.
>
> I have no trouble grasping the idea that the Knuth algorithm only
> creates effective breaks in intermediate positions, and takes certain
> actions for those breaks. Ok, so that means the start- or end-of-
> paragraph line-break is not created by this algorithm in itself, and
> remains out-of-scope here. Would it not be a much easier and much
> more straightforward solution to have every paragraph end with an
> infinitely low penalty, so that the algorithm eventually treats
> trailing spaces in the last line-area just the same as it would for
> 'normal' line-breaks?

No, leading and trailing paragraph spaces must be removed BEFORE 
linebreaking, that is before we get into the Knuth stuff otherwise they 
may be incorrectly considered as part of the linebreaking line length 
and adjustment calculations. Therefore when this was done during 
refinement at the block level it was just the right place IMO. 
Obviously spaces around formatter generated linebreaks must be dealt 
with during linebreaking.

>
> > We can control line-feed-treatment at Knuth generated breaks by
> > constructing the proper sequences which we will do eventually. But
> > start/end paragraph is outside of that which is why I am keen to
> > push it into the FO refinement stage (as it used to be).
>
> As I said, it's all the same to me. If you (and a few others, of
> course) think we were better off before I committed my changes, then
> by all means, go ahead and revert... I did my homework, and posted it
> as a patch for review first. As I recall, only Finn had anything to
> add, and his comment was taken into account. The rest of you remained
> silent, which I consider to be at least a '+0' (= go ahead if you
> want to, but don't expect any assistance from us, because we already
> have our hands full).
>

That is not the point at all. The previous algorithm was defective in 
the sense of not dealing with whitespace around markers and possibly 
other fo's with text content.

The task at hand was to extend the whitespace handling to other fo's 
which were previously omitted, e.g. markers. Your change does that 
however, it does not preserve the existing functionality. Therefore its 
progress in one sense and regression in another. What I am asking you 
to do is to look for a solution were we don't have any regressions and 
still get the whitespace handling applied to other fos.

BTW, if you had mentioned the regression in your patch description I 
would have raised my objections at that time. You only mentioned it 
after you applied the patch.

>
> Cheers,
>
> Andreas

Regards

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 1, 2006, at 17:15, Manuel Mall wrote:

> The Knuth algorithm (read the paper) deals only with box/pen/glue for
> the purpose of breaking lines and if it breaks a line it takes certain
> actions with respect to discarding pen/glue elements directly  
> following
> the break it created. If it doesn't create a line break it leaves
> everything as it is. This means everything at the beginning and end of
> a paragraph is left untouched. line-feed-treatment at the beginning  
> and
> end of a paragraph is not influenced by the Knuth algorithm and
> therefore cannot be controlled by whatever sequences we generate.

Ahem... I do get your point, but the fact of the matter remains that  
the trailing spaces should be removed for the reason that they would  
end up at the end of a *line-area*, not because they end up at the  
end of the *paragraph*.

I have no trouble grasping the idea that the Knuth algorithm only  
creates effective breaks in intermediate positions, and takes certain  
actions for those breaks. Ok, so that means the start- or end-of- 
paragraph line-break is not created by this algorithm in itself, and  
remains out-of-scope here. Would it not be a much easier and much  
more straightforward solution to have every paragraph end with an  
infinitely low penalty, so that the algorithm eventually treats  
trailing spaces in the last line-area just the same as it would for  
'normal' line-breaks?

> We can control line-feed-treatment at Knuth generated breaks by
> constructing the proper sequences which we will do eventually. But
> start/end paragraph is outside of that which is why I am keen to push
> it into the FO refinement stage (as it used to be).

As I said, it's all the same to me. If you (and a few others, of  
course) think we were better off before I committed my changes, then  
by all means, go ahead and revert... I did my homework, and posted it  
as a patch for review first. As I recall, only Finn had anything to  
add, and his comment was taken into account. The rest of you remained  
silent, which I consider to be at least a '+0' (= go ahead if you  
want to, but don't expect any assistance from us, because we already  
have our hands full).


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Sun, 1 Jan 2006 10:22 pm, Andreas L Delmelle wrote:
> On Dec 31, 2005, at 17:02, Andreas L Delmelle wrote:
>
> (been pondering a bit more over this, and...)
>
> > Et voilà, that seems to be where the real *flaw* is located, if you
> > ask me. It should care about glues at the beginning of a line --
> > which it seems to handle perfectly ATM--
>
> In fact, this may currently be handled 'too perfectly'. One of the
> testcases --block_white-space_2.xml-- fails because a leading non-
> breaking space is removed, contrary to the expectation.
>
> Don't get me wrong. I still think that it is unnecessary to remove
> the mentioned trailing white-space for trailing nested inlines in a
> paragraph in the FOTree.
>
> Only, I think I'm beginning to see what is meant by this paradox:
> > Besides that, I get the impression you're somewhat contradicting
> > yourself here:
> > - in the comment on the failing testcase you noted that 'These
> > tests fail because the Knuth element sequences for consecutive
> > whitespace are not correct.'
> > - and now you're saying that it's not a matter of generating the
> > correct element sequences
>

You still don't seem to quite get my point.

The Knuth algorithm (read the paper) deals only with box/pen/glue for 
the purpose of breaking lines and if it breaks a line it takes certain 
actions with respect to discarding pen/glue elements directly following 
the break it created. If it doesn't create a line break it leaves 
everything as it is. This means everything at the beginning and end of 
a paragraph is left untouched. line-feed-treatment at the beginning and 
end of a paragraph is not influenced by the Knuth algorithm and 
therefore cannot be controlled by whatever sequences we generate.

We can control line-feed-treatment at Knuth generated breaks by 
constructing the proper sequences which we will do eventually. But 
start/end paragraph is outside of that which is why I am keen to push 
it into the FO refinement stage (as it used to be).

>
> Would this be a correct assessment?
>
>
> Cheers,
>
> Andreas

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Dec 31, 2005, at 17:02, Andreas L Delmelle wrote:

(been pondering a bit more over this, and...)

> Et voilà, that seems to be where the real *flaw* is located, if you  
> ask me. It should care about glues at the beginning of a line -- 
> which it seems to handle perfectly ATM--

In fact, this may currently be handled 'too perfectly'. One of the  
testcases --block_white-space_2.xml-- fails because a leading non- 
breaking space is removed, contrary to the expectation.

Don't get me wrong. I still think that it is unnecessary to remove  
the mentioned trailing white-space for trailing nested inlines in a  
paragraph in the FOTree.

Only, I think I'm beginning to see what is meant by this paradox:

> Besides that, I get the impression you're somewhat contradicting  
> yourself here:
> - in the comment on the failing testcase you noted that 'These  
> tests fail because the Knuth element sequences for consecutive  
> whitespace are not correct.'
> - and now you're saying that it's not a matter of generating the  
> correct element sequences

The flaw here is that, IIC, the element sequences generated for nbsp  
are basically the same as for a common space, leading to the exact  
same type of area being (or not being) added to the Area Tree (=  
<space .../>)

Somewhere the decision has to be made: do we or do we not add an area  
for this box/element? It's precisely there that the algorithm should  
make the evaluation, taking into consideration the white-space  
related properties and the underlying character's suppress-at-line- 
break property.

Would this be a correct assessment?


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Dec 31, 2005, at 16:05, Manuel Mall wrote:

>> [Me:]
>> Well, it's definitely not impossible, but I'm wondering a bit about
>> Cost vs. Benefit. Currently, when the trailing spaces for any inline
>> are treated --in Inline.endOfNode()-- one has no way of knowing
>> whether any text will still follow --possible subsequent nested
>> inlines, text or characters will not be available yet.
>>
>
> This indicates to me that your redesigned algorithm has the same flaws
> as we currently encounter with the inline layout manager structure.  
> Any
> problems which require looking across FO (= LM) boundaries suddenly
> become hard. BTW, the original block level whitespace handling
> refinement didn't have that problem as it had the whole block content
> to available to it. So I still think we have regressed here.

Maybe so... but I'm looking at this as taking a step backwards like  
one does before taking a leap.

Besides that, it is not a *flaw* per se. Strictly speaking, white- 
space collapsing/removal applies to sibling character nodes in the  
source document. The fact that leading white-space in a paragraph can  
be removed during refinement without any real extra effort is a  
convenience, a bonus that follows from the preceding text-nodes or  
inline-nodes already being processed (= the state indicated by the  
'inWhiteSpace' and 'afterLinefeed' variables can be carried over).  
There is no need for look-behind here (the previous algorithm didn't  
do so either).

The possible problem I saw with the block-level white-space handling  
was that all white-space characters would continue to take up memory  
until the first nested block or in the worst case, until the end-of- 
block. In case of large blocks with lots of indents due to pretty- 
printing, the current approach makes these spaces disappear much  
sooner (= more memory-efficient).

When I talk about cost/benefit, I refer to the fact that we already  
get two passes over the same character sequences:
- once when building the FOTree
- another when performing layout

In order to implement this trailing white-space removal for nested  
trailing inlines during refinement --I can't stress it enough: a  
*purely* aesthetical matter; the conceptual/logical necessity still  
escapes me...-- we would have to add a third pass.

>> In theory, we could keep a reference alive to the last FOText of the
>> previous inline, so that when it appears at the end of the block, we
>> could strip its trailing white-space too.
>
> Yes, that is what you get when doing this fo centric. You have to keep
> context / state / global variables to deal with "cross border" issues.

Carrying over the context is no problem when it comes to previous  
nodes, but you simply don't have the luxury of look-ahead in the  
FOTree --that is, look-ahead is limited to the nodes already  
availiable at that point. One way to deal with it is to accumulate  
all nodes, and only process them at the end-of-block/nested blocks.  
This has the above mentioned drawback --space characters taking up  
resources far longer than strictly necessary.

OTOH, look-ahead in the FOTree isn't really required for anything  
(apart from maybe this particular scenario).
The layout algorithm *needs* to be able to move/look in both  
directions anyway, so AFAICT, it shouldn't be too much effort to  
handle trailing spaces for trailing nested inlines there... If that  
is such a difficult matter, then one should doubt the layout- 
algorithm, if anything, instead of trying to work around the lack of  
look-ahead in the FOTree.

>> [Me:]
>> Apart from the aesthetic argument (nice symmetry): why exactly?
>> Again, IMO, if the right element-sequences are generated for these
>> white-spaces, they should be suppressed at the end of the paragraph
>> anyway (forced EOL).
>>
>
> Its not a matter of generating the correct Knuth element sequences
> because the algorithm doesn't care about what is at the beginning or
> end of a paragraph. Giving the correct (= whitespace handled)  
> paragraph
> to the Knuth algorithm is a precondition. Again: line breaking deals
> with adding breaks at optimal allowable points within the text it
> doesn't care what's at the start and end.

Et voilà, that seems to be where the real *flaw* is located, if you  
ask me. It should care about glues at the beginning of a line --which  
it seems to handle perfectly ATM-- regardless of whether it's the  
first line in a paragraph or not. In the same way, it should care  
about glues at the end of a line, regardless of whether it is the  
last line in a paragraph or not.

Besides that, I get the impression you're somewhat contradicting  
yourself here:
- in the comment on the failing testcase you noted that 'These tests  
fail because the Knuth element sequences for consecutive whitespace  
are not correct.'
- and now you're saying that it's not a matter of generating the  
correct element sequences

Can you clarify? Doesn't this indicate that there is a difference in  
processing between the last line in a paragraph and all other  
lines... which seems inconsistent. A line is a line is a line, no  
matter at what position in the paragraph we find ourselves.


Cheers,

Andreas


Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Sat, 31 Dec 2005 09:23 pm, Andreas L Delmelle wrote:
> On Dec 31, 2005, at 08:26, Manuel Mall wrote:
> > On Sat, 31 Dec 2005 02:41 am, Andreas L Delmelle wrote:
> >> Point is: if trailing spaces in a line are correctly suppressed
> >> during line-building, the trailing spaces in the last inline of a
> >> given block would be removed in that step (no matter at what depth
> >> the inline is nested).
> >
> > the problem is that the Knuth algorithm doesn't deal with spaces
> > (glue)
> > at the end or beginning of a paragraph. It only discards space
> > (glue) when the algorithm creates a line break.
>
> Not always: see block_white-space-collapse_2.xml
> The reason why it fails is that the trailing spaces at the end of the
> first line aren't discarded. Specifying text-align="justify" makes
> the algorithm throw away the trailing spaces (maybe "end" or "right"
> too, haven't checked that yet)
>

These tests fail because the Knuth element sequences for consecutive 
whitespace are not correct. A sequence of whitespace currently 
generates a Knuth sequence (simplified) of the form:

pen - glue - pen - glue - pen - glue ....

This means every space becomes a valid break point. In the usual ignore 
scenario (white-space-treatment="ignore...") this is incorrect as the 
only valid break point should be the first space (and all be 
discarded). So the sequence should look more like:

pen - glue - glue - glue ....

The correct sequence for white-space-treatment="preserve" is more 
interesting, every space becomes something like:

 pen
 box w=0
 pen inf
 glue

The first penalty is the actual break possibility, the box prevents 
discarding of the following glue if the break is chosen, the infinite 
penalty prevents the glue from being a break possibility.

In summary the current Knuth sequences are incorrect and just happen to 
work in the special case of a single space that is under 
white-space-collapse=true and 
white-space-treatment="ignore-if-surrounding-linefeed". Luckily this is 
the most common scenario.

> > It is (messy?) FOP custom code outside the core Knuth algorithm
> > which deals with removing glue at the
> > beginning and end of a paragraph. This should IMO therefore dealt
> > with during refinement. I assume (haven't checked) that your
> > whitespace handling does remove all leading whitespace in a
> > paragraph and therefore it would make sense if it also removes all
> > trailing whitespace (nice symmetry :-)).
>
> Yeah, it would be a very nice symmetry :-)
> Well, it's definitely not impossible, but I'm wondering a bit about
> Cost vs. Benefit. Currently, when the trailing spaces for any inline
> are treated --in Inline.endOfNode()-- one has no way of knowing
> whether any text will still follow --possible subsequent nested
> inlines, text or characters will not be available yet.
>

This indicates to me that your redesigned algorithm has the same flaws 
as we currently encounter with the inline layout manager structure. Any 
problems which require looking across FO (= LM) boundaries suddenly 
become hard. BTW, the original block level whitespace handling 
refinement didn't have that problem as it had the whole block content 
to available to it. So I still think we have regressed here.

> In theory, we could keep a reference alive to the last FOText of the
> previous inline, so that when it appears at the end of the block, we
> could strip its trailing white-space too.

Yes, that is what you get when doing this fo centric. You have to keep 
context / state / global variables to deal with "cross border" issues.

> OTOH, if the white-space suppression in layout is made to work
> properly in all cases, those trailing spaces should automatically be
> removed since they are trailing in a line (whether it is the last
> line in the paragraph or not shouldn't make any difference).
>
> So, I held off FTM on trying to remove these spaces during
> refinement, and wanted to see if this problem doesn't get solved by
> tweaking the white-space removal during line-building.
>
> > Note that the point is that we don't need any special code to
> > discard whitespace around Knuth generated linebreaks as the
> > algorithm does that
> > for us (actually we need special code to prevent discards for
> > certain linefeed-treatment values but that is more of a matter of
> > generating Knuth sequences which allow breaks but don't discard and
> > does not require a change to the algorithms). Therefore the only
> > special case is
> > the beginning and end of a paragraph. As the beginning is handled
> > by whitespace handling at the FO level the end bit should be as
> > well.
>
> Apart from the aesthetic argument (nice symmetry): why exactly?
> Again, IMO, if the right element-sequences are generated for these
> white-spaces, they should be suppressed at the end of the paragraph
> anyway (forced EOL).
>

Its not a matter of generating the correct Knuth element sequences 
because the algorithm doesn't care about what is at the beginning or 
end of a paragraph. Giving the correct (= whitespace handled) paragraph 
to the Knuth algorithm is a precondition. Again: line breaking deals 
with adding breaks at optimal allowable points within the text it 
doesn't care what's at the start and end.

> In the end, it's all the same to me, I guess...
>
>
> Cheers,
>
> Andreas

Regards

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Dec 31, 2005, at 08:26, Manuel Mall wrote:

> On Sat, 31 Dec 2005 02:41 am, Andreas L Delmelle wrote:
>>
>> Point is: if trailing spaces in a line are correctly suppressed
>> during line-building, the trailing spaces in the last inline of a
>> given block would be removed in that step (no matter at what depth
>> the inline is nested).
>>

> the problem is that the Knuth algorithm doesn't deal with spaces  
> (glue)
> at the end or beginning of a paragraph. It only discards space (glue)
> when the algorithm creates a line break.

Not always: see block_white-space-collapse_2.xml
The reason why it fails is that the trailing spaces at the end of the  
first line aren't discarded. Specifying text-align="justify" makes  
the algorithm throw away the trailing spaces (maybe "end" or "right"  
too, haven't checked that yet)

> It is (messy?) FOP custom code outside the core Knuth algorithm  
> which deals with removing glue at the
> beginning and end of a paragraph. This should IMO therefore dealt with
> during refinement. I assume (haven't checked) that your whitespace
> handling does remove all leading whitespace in a paragraph and
> therefore it would make sense if it also removes all trailing
> whitespace (nice symmetry :-)).

Yeah, it would be a very nice symmetry :-)
Well, it's definitely not impossible, but I'm wondering a bit about  
Cost vs. Benefit. Currently, when the trailing spaces for any inline  
are treated --in Inline.endOfNode()-- one has no way of knowing  
whether any text will still follow --possible subsequent nested  
inlines, text or characters will not be available yet.

In theory, we could keep a reference alive to the last FOText of the  
previous inline, so that when it appears at the end of the block, we  
could strip its trailing white-space too.
OTOH, if the white-space suppression in layout is made to work  
properly in all cases, those trailing spaces should automatically be  
removed since they are trailing in a line (whether it is the last  
line in the paragraph or not shouldn't make any difference).

So, I held off FTM on trying to remove these spaces during  
refinement, and wanted to see if this problem doesn't get solved by  
tweaking the white-space removal during line-building.

> Note that the point is that we don't need any special code to discard
> whitespace around Knuth generated linebreaks as the algorithm does  
> that
> for us (actually we need special code to prevent discards for certain
> linefeed-treatment values but that is more of a matter of generating
> Knuth sequences which allow breaks but don't discard and does not
> require a change to the algorithms). Therefore the only special  
> case is
> the beginning and end of a paragraph. As the beginning is handled by
> whitespace handling at the FO level the end bit should be as well.

Apart from the aesthetic argument (nice symmetry): why exactly?  
Again, IMO, if the right element-sequences are generated for these  
white-spaces, they should be suppressed at the end of the paragraph  
anyway (forced EOL).

In the end, it's all the same to me, I guess...


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Sat, 31 Dec 2005 02:41 am, Andreas L Delmelle wrote:
> On Dec 30, 2005, at 16:50, Manuel Mall wrote:
> > On Fri, 30 Dec 2005 11:25 pm, Andreas L Delmelle wrote:
> >> Not really. Just had to draw a line somewhere... If you do it for
> >> the last inline in a block, then you would also have to do it for
> >> the last inline of the last inline of a block etc. Besides, you
> >> get a second pass anyway, when the line is built. All the trailing
> >> space- glyph-areas could be removed there (but they currently
> >> aren't anyway, depending on text-alignment).
> >
> > I am still not sure if this is not a step backwards.  Before the
> > model was: All whitespace handling apart from dealing with
> > whitespace around FOP generated linebreaks is done during the
> > initial refinement.
> >
> > Now this is not really the case any more and the line breaking
> > stuff would have to deal with treating whitespace in other places
> > than around
> > its own generated linebreaks as well. I was hoping we could get rid
> > of the trailing paragraph space removal code in the line breaking
> > algorithm as it is one of those areas causing trouble again and
> > again.
>
> Trailing spaces in a paragraph: yes, absolutely, which is why the
> trailing whitespace in any block is removed there (albeit only
> whitespace characters that are direct descendants of the block).
>
> Trailing spaces in a line: now *this* is where currently most of the
> tests fail. Trailing spaces are discarded only when you force text-
> align to justify (for example).
>
> Point is: if trailing spaces in a line are correctly suppressed
> during line-building, the trailing spaces in the last inline of a
> given block would be removed in that step (no matter at what depth
> the inline is nested).
>

Andreas,

the problem is that the Knuth algorithm doesn't deal with spaces (glue) 
at the end or beginning of a paragraph. It only discards space (glue) 
when the algorithm creates a line break. It is (messy?) FOP custom code 
outside the core Knuth algorithm which deals with removing glue at the 
beginning and end of a paragraph. This should IMO therefore dealt with 
during refinement. I assume (haven't checked) that your whitespace 
handling does remove all leading whitespace in a paragraph and 
therefore it would make sense if it also removes all trailing 
whitespace (nice symmetry :-)).

Note that the point is that we don't need any special code to discard 
whitespace around Knuth generated linebreaks as the algorithm does that 
for us (actually we need special code to prevent discards for certain 
linefeed-treatment values but that is more of a matter of generating 
Knuth sequences which allow breaks but don't discard and does not 
require a change to the algorithms). Therefore the only special case is 
the beginning and end of a paragraph. As the beginning is handled by 
whitespace handling at the FO level the end bit should be as well.

>
> Cheers,
>
> Andreas

Regards

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Simon Pepping <sp...@leverkruid.nl>.
On Fri, Dec 30, 2005 at 11:50:21PM +0800, Manuel Mall wrote:
> On Fri, 30 Dec 2005 11:25 pm, Andreas L Delmelle wrote:
> > On Dec 30, 2005, at 14:54, Manuel Mall wrote:
> > > On Fri, 30 Dec 2005 09:38 pm, Andreas L Delmelle wrote:
> > >> I explicitly excluded fo:leaders from white-space handling,
> > >> because for example:
> > >>
> > >> <fo:leader leader-pattern="use-content">   xxx   </fo:leader>
> > >>
> > >> Collapsing the three spaces to one may produce unintended results.
> > >>
> > >> OTOH, if you have a nested inline in a leader, then the
> > >> white-space for the inline will be treated...
> > >
> > > Is there an indication in the spec that whitespace around
> > > use-content leader patterns should be treated any different? If
> > > not, I would include it into the normal white space handling.
> >
> > This was more based on expectation than on anything I encountered in
> > the specs, I guess. The white-space around the leader --physically
> > outside of the fo:leader element-- is treated according to the type
> > of parent it belongs to. The spaces inside the content of the
> > fo:leader are left alone. Somehow, even with white-space-
> > collapse="true", I'd much more expect the end result to mimic:
> >
> > <fo:leader leader-pattern="use-content">...xxx...</fo:leader>
> >
> > than
> >
> > <fo:leader leader-pattern="use-content"> xxx </fo:leader>
> >
> 
> Actually I wouldn't (assuming default white space handling property 
> values). What do others think?

I agree with Manuel. The white-space-collapse value holds
everywhere. The user must provide a value of false if he wants a
leader pattern with multiple adjacent spaces.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl


Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Dec 30, 2005, at 16:50, Manuel Mall wrote:

> On Fri, 30 Dec 2005 11:25 pm, Andreas L Delmelle wrote:
>> Not really. Just had to draw a line somewhere... If you do it for the
>> last inline in a block, then you would also have to do it for the
>> last inline of the last inline of a block etc. Besides, you get a
>> second pass anyway, when the line is built. All the trailing space-
>> glyph-areas could be removed there (but they currently aren't anyway,
>> depending on text-alignment).
>>
>
> I am still not sure if this is not a step backwards.  Before the model
> was: All whitespace handling apart from dealing with whitespace around
> FOP generated linebreaks is done during the initial refinement.
>
> Now this is not really the case any more and the line breaking stuff
> would have to deal with treating whitespace in other places than  
> around
> its own generated linebreaks as well. I was hoping we could get rid of
> the trailing paragraph space removal code in the line breaking
> algorithm as it is one of those areas causing trouble again and again.

Trailing spaces in a paragraph: yes, absolutely, which is why the  
trailing whitespace in any block is removed there (albeit only  
whitespace characters that are direct descendants of the block).

Trailing spaces in a line: now *this* is where currently most of the  
tests fail. Trailing spaces are discarded only when you force text- 
align to justify (for example).

Point is: if trailing spaces in a line are correctly suppressed  
during line-building, the trailing spaces in the last inline of a  
given block would be removed in that step (no matter at what depth  
the inline is nested).


Cheers,

Andreas

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Fri, 30 Dec 2005 11:25 pm, Andreas L Delmelle wrote:
> On Dec 30, 2005, at 14:54, Manuel Mall wrote:
> > On Fri, 30 Dec 2005 09:38 pm, Andreas L Delmelle wrote:
> > <snip/>
> >
> >> Case not covered by the altered code (but I didn't think it to be
> >> a showstopper):
> >> <snip />
> >
> > Hmm, isn't that a step backwards from the status before you applied
> > the
> > patch?
>
> Not really. Just had to draw a line somewhere... If you do it for the
> last inline in a block, then you would also have to do it for the
> last inline of the last inline of a block etc. Besides, you get a
> second pass anyway, when the line is built. All the trailing space-
> glyph-areas could be removed there (but they currently aren't anyway,
> depending on text-alignment).
>

I am still not sure if this is not a step backwards.  Before the model 
was: All whitespace handling apart from dealing with whitespace around 
FOP generated linebreaks is done during the initial refinement.

Now this is not really the case any more and the line breaking stuff 
would have to deal with treating whitespace in other places than around 
its own generated linebreaks as well. I was hoping we could get rid of 
the trailing paragraph space removal code in the line breaking 
algorithm as it is one of those areas causing trouble again and again.

> >> I explicitly excluded fo:leaders from white-space handling,
> >> because for example:
> >>
> >> <fo:leader leader-pattern="use-content">   xxx   </fo:leader>
> >>
> >> Collapsing the three spaces to one may produce unintended results.
> >>
> >> OTOH, if you have a nested inline in a leader, then the
> >> white-space for the inline will be treated...
> >
> > Is there an indication in the spec that whitespace around
> > use-content leader patterns should be treated any different? If
> > not, I would include it into the normal white space handling.
>
> This was more based on expectation than on anything I encountered in
> the specs, I guess. The white-space around the leader --physically
> outside of the fo:leader element-- is treated according to the type
> of parent it belongs to. The spaces inside the content of the
> fo:leader are left alone. Somehow, even with white-space-
> collapse="true", I'd much more expect the end result to mimic:
>
> <fo:leader leader-pattern="use-content">...xxx...</fo:leader>
>
> than
>
> <fo:leader leader-pattern="use-content"> xxx </fo:leader>
>

Actually I wouldn't (assuming default white space handling property 
values). What do others think?

> <snip />
>
> Cheers and Best Wishes for 2006.
>
> Andreas

Same to you

Manuel

Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Dec 30, 2005, at 14:54, Manuel Mall wrote:

> On Fri, 30 Dec 2005 09:38 pm, Andreas L Delmelle wrote:
> <snip/>
>> Case not covered by the altered code (but I didn't think it to be a
>> showstopper):
>> <snip />
> Hmm, isn't that a step backwards from the status before you applied  
> the
> patch?
>

Not really. Just had to draw a line somewhere... If you do it for the  
last inline in a block, then you would also have to do it for the  
last inline of the last inline of a block etc. Besides, you get a  
second pass anyway, when the line is built. All the trailing space- 
glyph-areas could be removed there (but they currently aren't anyway,  
depending on text-alignment).

>> I explicitly excluded fo:leaders from white-space handling, because
>> for example:
>>
>> <fo:leader leader-pattern="use-content">   xxx   </fo:leader>
>>
>> Collapsing the three spaces to one may produce unintended results.
>>
>> OTOH, if you have a nested inline in a leader, then the white-space
>> for the inline will be treated...
>>
> Is there an indication in the spec that whitespace around use-content
> leader patterns should be treated any different? If not, I would
> include it into the normal white space handling.

This was more based on expectation than on anything I encountered in  
the specs, I guess. The white-space around the leader --physically  
outside of the fo:leader element-- is treated according to the type  
of parent it belongs to. The spaces inside the content of the  
fo:leader are left alone. Somehow, even with white-space- 
collapse="true", I'd much more expect the end result to mimic:

<fo:leader leader-pattern="use-content">...xxx...</fo:leader>

than

<fo:leader leader-pattern="use-content"> xxx </fo:leader>

<snip />
>>
> Didn't your patch fix the marker_bug.xml testcase? If so it can  
> come out
> of the disabled-testcases.

Yep, it did. Completely forgot about that. Thanks for pointing out.


Cheers and Best Wishes for 2006.

Andreas


Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/

Posted by Manuel Mall <mm...@arcus.com.au>.
On Fri, 30 Dec 2005 09:38 pm, Andreas L Delmelle wrote:
> On Dec 30, 2005, at 14:33, adelmelle@apache.org wrote:
<snip/>
> Case not covered by the altered code (but I didn't think it to be a
> showstopper):
>
> If you have:
>    <fo:block>
>      <fo:inline>some inline text _
> ____</fo:inline>_
> __</fo:block>
>
>
> Currently, the first series of underlined white-space is not
> completely suppressed. It will at most be collapsed to a single
> space. The second series --between endInline() and endBlock()-- is
> completely suppressed because handleWhiteSpace() was called from
> Block.endOfNode().
>
Hmm, isn't that a step backwards from the status before you applied the 
patch?

> I explicitly excluded fo:leaders from white-space handling, because
> for example:
>
> <fo:leader leader-pattern="use-content">   xxx   </fo:leader>
>
> Collapsing the three spaces to one may produce unintended results.
>
> OTOH, if you have a nested inline in a leader, then the white-space
> for the inline will be treated...
>
Is there an indication in the spec that whitespace around use-content 
leader patterns should be treated any different? If not, I would 
include it into the normal white space handling.

> For the rest only a few minor updates to related test-cases:
> - block_white-space-collapse_2.xml: see info disabled-testcases.xml
> - leader_text-align.xml / leader_toc.xml: update of the expected ipd
> values; they seemed to ignore preserved spaces
>
Didn't your patch fix the marker_bug.xml testcase? If so it can come out 
of the disabled-testcases.

<snip/>
> Cheers,
>
> Andreas
Regards

Manuel