You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2006/01/01 15:22:46 UTC
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
On Dec 31, 2005, at 17:02, Andreas L Delmelle wrote:
(been pondering a bit more over this, and...)
> Et voilĂ , that seems to be where the real *flaw* is located, if you
> ask me. It should care about glues at the beginning of a line --
> which it seems to handle perfectly ATM--
In fact, this may currently be handled 'too perfectly'. One of the
testcases --block_white-space_2.xml-- fails because a leading non-
breaking space is removed, contrary to the expectation.
Don't get me wrong. I still think that it is unnecessary to remove
the mentioned trailing white-space for trailing nested inlines in a
paragraph in the FOTree.
Only, I think I'm beginning to see what is meant by this paradox:
> Besides that, I get the impression you're somewhat contradicting
> yourself here:
> - in the comment on the failing testcase you noted that 'These
> tests fail because the Knuth element sequences for consecutive
> whitespace are not correct.'
> - and now you're saying that it's not a matter of generating the
> correct element sequences
The flaw here is that, IIC, the element sequences generated for nbsp
are basically the same as for a common space, leading to the exact
same type of area being (or not being) added to the Area Tree (=
<space .../>)
Somewhere the decision has to be made: do we or do we not add an area
for this box/element? It's precisely there that the algorithm should
make the evaluation, taking into consideration the white-space
related properties and the underlying character's suppress-at-line-
break property.
Would this be a correct assessment?
Cheers,
Andreas
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
That proves the point that I shouldn't meddle in things I don't fully
understand, yet, and don't have enough time to really get to know.
Lesson learnt.
On 04.01.2006 13:10:42 Manuel Mall wrote:
<snip/>
> 1. The patch is not solving the whitespace handling problem for markers
> which was one of its initial drivers. We can blame Jeremias here - just
> to drag in another innocent party :-) - as he suggested factoring out
> the fo:block specific whitespace refinement so it can be applied to
> markers. Unfortunately that was a bad idea.
<snip/>
Jeremias Maerki
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Manuel Mall <mm...@arcus.com.au>.
On Fri, 6 Jan 2006 04:56 am, Andreas L Delmelle wrote:
> On Jan 5, 2006, at 18:48, Andreas L Delmelle wrote:
>
> <snip />
>
> To summarize this thread (it has taken long enough :-))
>
> I thought it over a bit more, and what I'm currently working on (and
> will most likely finish during the weekend) is the following:
>
> 1) Basically keep the algorithm the way I recently altered it, but
> containing some additional processing for trailing inline FOs that
> end with a sequence of white-space. Determining this last bit is easy
> enough, since it just means that XMLWhiteSpaceHandler.inWhiteSpace
> will be false after handleWhiteSpace(). At the end of the block, we
> will do one more pass over all those trailing inlines, if any.
> IMO, in the vast majority of use-cases there will be either zero, one
> or at most two of those, but theoretically this could be any
> number... If there are any, then if white-space-collapse has the
> default value of "true" there will be only one trailing white-space
> character left at that point, so this additional bit of processing
> will cost virtually nothing.
>
> 2) Simplify the CharIterator structure, in the sense that we'll still
> only need an iterator over FOText and Characters. Unless layout needs
> access to the iterators, I think charIterator() can be pushed down to
> be specific to FObjMixed, and then the overrides of this method can
> be removed from all other FOs apart from FOText and Character. For
> 1), it could turn out handy if I add the possibility to iterate
> backwards until the last non-white-space is encountered...
>
> 3) Exclude markers (and their descendants) from white-space handling
> during refinement, for the mentioned reasons:
> * retrieve-marker's ancestor's white-space properties govern the
> treatment in this case
> * possibly page-break context is needed when dealing with
> alternating static-contents
> * retrieve-markers with retrieve-boundary="document"
>
> 3) of course means the recently enabled marker_bug.xml testcase will
> have to be disabled again until we find a way to tackle this in
> layout. I had thought of using XMLWhiteSpaceHandler itself for this,
> but the tricky part is that, once a Marker (and its descendants) have
> been white-space-treated, the stripped white-space is permanently
> gone, and since that same Marker can again be retrieved in a
> different context etc.
>
> [end-of-thread, I hope ;-)]
>
Thanks for the summary and yes I think we are at the end of this one.
Personally I would not do 3) at this point in time, that is I would not
exclude markers from the whitespace refinement. IMO the whitespace
handling properties will have their default values (or matching values
in the marker and retrieve-marker contexts) most of the time and
therefore the current handling produces better results more often than
by reverting that part of the patch. But this is a judgement call and I
am not really fussed. There is a testcase which shows how it fails when
the properties are not matching and this should suffice to document the
problem.
> Cheers,
>
> Andreas
Manuel
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/
src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Chris Bowditch <bo...@hotmail.com>.
Andreas L Delmelle wrote:
<snip what="excellent summary"/>
>
> [end-of-thread, I hope ;-)]
Thanks for writing this summary Andreas. I for one, am a lot clearer on
this now, and in full agreement with your proposed course of action.
Thanks,
Chris
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 5, 2006, at 18:48, Andreas L Delmelle wrote:
<snip />
To summarize this thread (it has taken long enough :-))
I thought it over a bit more, and what I'm currently working on (and
will most likely finish during the weekend) is the following:
1) Basically keep the algorithm the way I recently altered it, but
containing some additional processing for trailing inline FOs that
end with a sequence of white-space. Determining this last bit is easy
enough, since it just means that XMLWhiteSpaceHandler.inWhiteSpace
will be false after handleWhiteSpace(). At the end of the block, we
will do one more pass over all those trailing inlines, if any.
IMO, in the vast majority of use-cases there will be either zero, one
or at most two of those, but theoretically this could be any
number... If there are any, then if white-space-collapse has the
default value of "true" there will be only one trailing white-space
character left at that point, so this additional bit of processing
will cost virtually nothing.
2) Simplify the CharIterator structure, in the sense that we'll still
only need an iterator over FOText and Characters. Unless layout needs
access to the iterators, I think charIterator() can be pushed down to
be specific to FObjMixed, and then the overrides of this method can
be removed from all other FOs apart from FOText and Character. For
1), it could turn out handy if I add the possibility to iterate
backwards until the last non-white-space is encountered...
3) Exclude markers (and their descendants) from white-space handling
during refinement, for the mentioned reasons:
* retrieve-marker's ancestor's white-space properties govern the
treatment in this case
* possibly page-break context is needed when dealing with
alternating static-contents
* retrieve-markers with retrieve-boundary="document"
3) of course means the recently enabled marker_bug.xml testcase will
have to be disabled again until we find a way to tackle this in
layout. I had thought of using XMLWhiteSpaceHandler itself for this,
but the tricky part is that, once a Marker (and its descendants) have
been white-space-treated, the stripped white-space is permanently
gone, and since that same Marker can again be retrieved in a
different context etc.
[end-of-thread, I hope ;-)]
Cheers,
Andreas
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 5, 2006, at 10:02, Chris Bowditch wrote:
> Andreas L Delmelle wrote:
>
>> I see a remote possibility to exclude the markers whose class-
>> name corresponds to at least one retrieve-marker that has an
>> ancestor with non-default white-space-treatment/-collapse. If no
>> such retrieve- marker exists, the white-space can be collapsed
>> during refinement. All possible retrieve-markers in a page-
>> sequence will, in any case, always be available at the point
>> where a given marker is processed (and through them, also their
>> ancestor-block's white-space related props). I'll see what I can
>> do about this ASAP, although I'm not sure whether this will gain
>> us much. The FOs are readily available, but they need to be
>> reached all the same.
>
> Now I'm not sure I follow your thinking here. How will you find
> retrieve-markers from a marker FO when retrieve-
> boundary="document" ???
'remote', I said, and too remote it seems. Thanks for pointing this
out! If not, I'd probably have spent a few hours before bumping into
this particular restriction...
Cheers,
Andreas
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/
src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Chris Bowditch <bo...@hotmail.com>.
Andreas L Delmelle wrote:
> On Jan 4, 2006, at 13:10, Manuel Mall wrote:
>
<snip/>
> Ouch! This was one thing I indeed completely lost track of: the
> properties governing white-space-treatment and the like for the
> corresponding retrieve-marker... To add to all the fun, there is indeed
> no way at all to solve this during refinement stage in a generic way,
> taking into account alternating static-contents (page- break context is
> needed for this).
This is a tricky problem to solve.
<snip/>
>
> To be on the safe side, it seems better if I revert at least partly.
> I think extracting the handleWhiteSpace() method into a separate class
> is still a good idea, even if only to avoid code-duplication and to
> have all the related logic together in one spot --no need to blame
> Jeremias for this thought :-)
> Combine this with the previous approach using the
> RecursiveCharIterators. I haven't removed much of that code anyway,
> didn't even rename the classes just yet, while they are currently never
> used recursively (=only deal with FOText and Characters).
Agreed
> I see a remote possibility to exclude the markers whose class-name
> corresponds to at least one retrieve-marker that has an ancestor with
> non-default white-space-treatment/-collapse. If no such retrieve- marker
> exists, the white-space can be collapsed during refinement. All
> possible retrieve-markers in a page-sequence will, in any case, always
> be available at the point where a given marker is processed (and
> through them, also their ancestor-block's white-space related props).
> I'll see what I can do about this ASAP, although I'm not sure whether
> this will gain us much. The FOs are readily available, but they need to
> be reached all the same.
Now I'm not sure I follow your thinking here. How will you find
retrieve-markers from a marker FO when retrieve-boundary="document" ???
Chris
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./
src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/
test/layoutengine/standard-testcases/
Posted by Manuel Mall <ma...@apache.org>.
> On Jan 4, 2006, at 13:10, Manuel Mall wrote:
>
<snip />
>
>> I am not quite sure what to recommend from here. May be along the
>> following lines:
>>
>> 1. Leave the current status quo including leave Andreas patch in the
>> system. At least it covers the most common scenario - whitespace
>> should
>> be removed for markers. Although it does it in the wrong place but we
>> don't have anything better yet.
>
> To be on the safe side, it seems better if I revert at least partly.
> I think extracting the handleWhiteSpace() method into a separate
> class is still a good idea, even if only to avoid code-duplication
> and to have all the related logic together in one spot --no need to
> blame Jeremias for this thought :-)
> Combine this with the previous approach using the
> RecursiveCharIterators. I haven't removed much of that code anyway,
> didn't even rename the classes just yet, while they are currently
> never used recursively (=only deal with FOText and Characters).
> I see a remote possibility to exclude the markers whose class-name
> corresponds to at least one retrieve-marker that has an ancestor with
> non-default white-space-treatment/-collapse. If no such retrieve-
> marker exists, the white-space can be collapsed during refinement.
> All possible retrieve-markers in a page-sequence will, in any case,
> always be available at the point where a given marker is processed
> (and through them, also their ancestor-block's white-space related
> props). I'll see what I can do about this ASAP, although I'm not sure
> whether this will gain us much. The FOs are readily available, but
> they need to be reached all the same.
>
Thanks Andreas, I'll be happy this with course of action.
>
> Cheers,
>
> Andreas
>
Manuel
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 4, 2006, at 13:10, Manuel Mall wrote:
> I think I have bad news for all who weighed into this debate.
>
> It now appears to me that there was a very good reason for the
> original
> version for the whitespace refinement algorithm not being run on
> markers. For markers refinement cannot be done in the context of the
> fo:marker as the actual property values (in this case the values for
> the white-space / linefeed related properties) can only be determined
> in the context of the fo:retrieve-marker.
<snip />
Ouch! This was one thing I indeed completely lost track of: the
properties governing white-space-treatment and the like for the
corresponding retrieve-marker... To add to all the fun, there is
indeed no way at all to solve this during refinement stage in a
generic way, taking into account alternating static-contents (page-
break context is needed for this).
<snip />
> white-space should NOT be removed but Andreas change now does
> remove it.
...which is indeed only allowed in case of default values for those
props on the retrieve-marker. A bit too enthusiastic of me.
<snip />
> I am not quite sure what to recommend from here. May be along the
> following lines:
>
> 1. Leave the current status quo including leave Andreas patch in the
> system. At least it covers the most common scenario - whitespace
> should
> be removed for markers. Although it does it in the wrong place but we
> don't have anything better yet.
To be on the safe side, it seems better if I revert at least partly.
I think extracting the handleWhiteSpace() method into a separate
class is still a good idea, even if only to avoid code-duplication
and to have all the related logic together in one spot --no need to
blame Jeremias for this thought :-)
Combine this with the previous approach using the
RecursiveCharIterators. I haven't removed much of that code anyway,
didn't even rename the classes just yet, while they are currently
never used recursively (=only deal with FOText and Characters).
I see a remote possibility to exclude the markers whose class-name
corresponds to at least one retrieve-marker that has an ancestor with
non-default white-space-treatment/-collapse. If no such retrieve-
marker exists, the white-space can be collapsed during refinement.
All possible retrieve-markers in a page-sequence will, in any case,
always be available at the point where a given marker is processed
(and through them, also their ancestor-block's white-space related
props). I'll see what I can do about this ASAP, although I'm not sure
whether this will gain us much. The FOs are readily available, but
they need to be reached all the same.
Cheers,
Andreas
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 4 Jan 2006 08:26 am, Manuel Mall wrote:
> On Wed, 4 Jan 2006 03:51 am, Andreas L Delmelle wrote:
> > On Jan 2, 2006, at 06:27, Manuel Mall wrote:
> > > On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
> > BTW: there is another gap that isn't completely covered by my
> > alterations. Markers are always white-space-treated as inlines,
> > which would lead to incorrect results if a marker is retrieved in a
> > context like
> >
> > <fo:block><fo:retrieve-marker .../></fo:block>
> >
> > As I see it, this means that something like what I described above
> > will need to be considered for this case as well. If the marker is
> > retrieved as a child of an fo:inline, the currently produced result
> > will be correct.
> >
> > Since authors are allowed to have static-contents that retrieve the
> > same marker twice, once as child of a block and another as a child
> > of an inline, we can't possibly decide at FOTree stage if these
> > spaces may be removed.
>
> This is a very interesting point you are making here. I need to look
> into that a bit more.
>
I think I have bad news for all who weighed into this debate.
It now appears to me that there was a very good reason for the original
version for the whitespace refinement algorithm not being run on
markers. For markers refinement cannot be done in the context of the
fo:marker as the actual property values (in this case the values for
the white-space / linefeed related properties) can only be determined
in the context of the fo:retrieve-marker. In this example:
<fo:block background-color="yellow" white-space-collapse="false">
<fo:retrieve-marker retrieve-class-name="m1" />
</fo:block>
...
<fo:marker marker-class-name="m1">
<fo:block>
First marker with whitespace around
</fo:block>
</fo:marker>
white-space should NOT be removed but Andreas change now does remove it.
There have been endless discussions on property inheritance in the
context of markers on this list before and even this issue was raised
before: http://marc.theaimsgroup.com/?l=fop-dev&m=110254108019344&w=2.
Where does this leave us?
1. The patch is not solving the whitespace handling problem for markers
which was one of its initial drivers. We can blame Jeremias here - just
to drag in another innocent party :-) - as he suggested factoring out
the fo:block specific whitespace refinement so it can be applied to
markers. Unfortunately that was a bad idea.
2. Because of the marker issue we need to have whitespace handling in
layout before or as part of the Knuth element generation.
I am not quite sure what to recommend from here. May be along the
following lines:
1. Leave the current status quo including leave Andreas patch in the
system. At least it covers the most common scenario - whitespace should
be removed for markers. Although it does it in the wrong place but we
don't have anything better yet.
2. Add a testcase which shows the incorrect whitespace handling for
markers so we have a record of this. I can do that as I have basically
written a testcase as part of this investigation.
3. Put some effort into the Knuth element generation for line building
area as this is all interrelated:
whitespace handling
UAX#14 line breaking
Handling of unicode spaces, zwsp, etc
<snip/>
> >
> > Cheers,
> >
> > Andreas
>
Regards
Manuel
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/
src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Chris Bowditch <bo...@hotmail.com>.
Manuel Mall wrote:
> On Wed, 4 Jan 2006 03:51 am, Andreas L Delmelle wrote:
>>
Sorry to interject into this debate, but I have to say that I agree with
Manuel and thought I'd better speak up as this debate doesn't appear to
be making any progress.
Thanks for trying to improve this important area of the code Andreas, I
don't want to appear ungrateful for your efforts, it's just I have
similar concerns to Manuel.
>>To sum it up:
>>Our implementation of Donald Knuth's algorithm first creates the
>>element lists for the FOs, and then from those lists it calculates
>>the most favorable break-positions. Subsequently, it adds the areas
>>based on those breaks to the block-area, right?
>>Now, what I mean:
>>If the element-lists for the trailing spaces(*) are modeled
>>appropriately, and we add a forced break (infinite penalty) for the
>>end-of-block, then the algorithm will always create one final pseudo-
>>line-break(**) where those spaces are dissolved if present, just as
>>they would be when it were the first line. The generated pseudo-line
>>(s) will have no content at all. Maybe a minor tweak needed in
>>LineArea to return zero BPD when it has no child-areas, and there we
>>go... In Block.addChildArea, we can then test for zero-BPD line-areas
>>to keep them from effectively being added to the block.
>>
>>Something like that? Or am I still missing important implications?
>>
I think the important point is that the Knuth algorithm cannot be made
to strip trailing spaces. Only by placing hacky code around the
algorithm can this effect been achieved. Code which from my perspective
has caused a lot of bugs and unwanted side effects. Bugs which Jeremias
and Manuel seem to be constantly fixing in this area. So I think leading
and trailing space removal should be kept in the refinement (FO Tree)
stage for this reason.
Also, as Manuel pointed out, the Knuth algorithm does not handle cross
LM space removal. Something which can be achieved more easily in the FO
Tree.
<snip/>
Chris
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 4 Jan 2006 03:51 am, Andreas L Delmelle wrote:
> On Jan 2, 2006, at 06:27, Manuel Mall wrote:
> > On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
> >> Would it not be a much easier and much
> >> more straightforward solution to have every paragraph end with an
> >> infinitely low penalty, so that the algorithm eventually treats
> >> trailing spaces in the last line-area just the same as it would
> >> for 'normal' line-breaks?
> >
> > No, leading and trailing paragraph spaces must be removed BEFORE
> > linebreaking, that is before we get into the Knuth stuff otherwise
> > they
> > may be incorrectly considered as part of the linebreaking line
> > length and adjustment calculations. Therefore when this was done
> > during refinement at the block level it was just the right place
> > IMO. Obviously spaces around formatter generated linebreaks must be
> > dealt with during linebreaking.
>
> Hmm... Yes, yes. We are growing closer. I think I like you. Well,
> actually, I'm growing a bit tired of this debate, but that's a Very
> Good Sign, if you catch the drift. :-)
>
> To sum it up:
> Our implementation of Donald Knuth's algorithm first creates the
> element lists for the FOs, and then from those lists it calculates
> the most favorable break-positions. Subsequently, it adds the areas
> based on those breaks to the block-area, right?
> Now, what I mean:
> If the element-lists for the trailing spaces(*) are modeled
> appropriately, and we add a forced break (infinite penalty) for the
> end-of-block, then the algorithm will always create one final pseudo-
> line-break(**) where those spaces are dissolved if present, just as
> they would be when it were the first line. The generated pseudo-line
> (s) will have no content at all. Maybe a minor tweak needed in
> LineArea to return zero BPD when it has no child-areas, and there we
> go... In Block.addChildArea, we can then test for zero-BPD line-areas
> to keep them from effectively being added to the block.
>
> Something like that? Or am I still missing important implications?
>
The point you are missing is that the Knuth algorithm only deletes
leading spaces in a line because it always breaks at the first of a
sequence of spaces. Therefore adding an infinite penalty at the end of
the paragraph doesn't achieve anything with respect to space removal.
And BTW we do add an infinite penalty at the end of a paragraph
already.
> (*) this made me wonder BTW in how many percent of the cases an
> fo:inline with a trailing space would actually end an fo:block.
> Anyone care to make an educated guess?
>
> (**) more than one in the very exceptional case where the trailing
> spaces would cause a line-break themselves, i.e. if there is just
> enough IPD left for one space, and we have more than one... but that
> would mean nested-nested-...-nested trailing fo:inlines, or one
> fo:inline with lots of non-collapsed spaces.
>
Not sure if this consideration is relevant.
> <snip />
>
> > That is not the point at all. The previous algorithm was defective
> > in the sense of not dealing with whitespace around markers and
> > possibly other fo's with text content.
>
> OK, so it is an improvement after all.
> Phew, <wipes forehead />, I almost thought I had become utterly
> useless... :-)
>
> > The task at hand was to extend the whitespace handling to other
> > fo's which were previously omitted, e.g. markers. Your change does
> > that however, it does not preserve the existing functionality.
> > Therefore its
> > progress in one sense and regression in another. What I am asking
> > you to do is to look for a solution were we don't have any
> > regressions and still get the whitespace handling applied to other
> > fos.
>
> See my above description: it can be done with much less effort IIC,
> both efficiency- and code-wise, if this particular step is left to
> the layout algorithm.
That's were we disagree - we had a simple working solution before your
patch - I like to have that back. Putting it into layout is a non
trivial exercise because it requires "cross fo/lm border" processing.
This is something layout currently doesn't do but the whitespace
routine at fo level before your patch did do. That's why I like it so
much :-).
>
> BTW: there is another gap that isn't completely covered by my
> alterations. Markers are always white-space-treated as inlines, which
> would lead to incorrect results if a marker is retrieved in a context
> like
>
> <fo:block><fo:retrieve-marker .../></fo:block>
>
> As I see it, this means that something like what I described above
> will need to be considered for this case as well. If the marker is
> retrieved as a child of an fo:inline, the currently produced result
> will be correct.
>
> Since authors are allowed to have static-contents that retrieve the
> same marker twice, once as child of a block and another as a child of
> an inline, we can't possibly decide at FOTree stage if these spaces
> may be removed.
>
This is a very interesting point you are making here. I need to look
into that a bit more.
> > BTW, if you had mentioned the regression in your patch description
> > I would have raised my objections at that time. You only mentioned
> > it after you applied the patch.
>
> True enough, I hadn't considered that. No harm intended and none
> taken, I hope...
Of course not.
>
> Anyway, up to here, this has yet again been a very stimulating
> discussion. Thanks for insisting on my reconsidering and rephrasing
> of ideas. At the start, I only *sensed* it was possible and desirable
> to move this to layout. Now I'm certain that it is not only possible,
> but also mandatory to do so, if we want to cover virtually all cases.
>
>
> Cheers,
>
> Andreas
Regards
Manuel
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 2, 2006, at 06:27, Manuel Mall wrote:
> On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
>> Would it not be a much easier and much
>> more straightforward solution to have every paragraph end with an
>> infinitely low penalty, so that the algorithm eventually treats
>> trailing spaces in the last line-area just the same as it would for
>> 'normal' line-breaks?
>
> No, leading and trailing paragraph spaces must be removed BEFORE
> linebreaking, that is before we get into the Knuth stuff otherwise
> they
> may be incorrectly considered as part of the linebreaking line length
> and adjustment calculations. Therefore when this was done during
> refinement at the block level it was just the right place IMO.
> Obviously spaces around formatter generated linebreaks must be dealt
> with during linebreaking.
Hmm... Yes, yes. We are growing closer. I think I like you. Well,
actually, I'm growing a bit tired of this debate, but that's a Very
Good Sign, if you catch the drift. :-)
To sum it up:
Our implementation of Donald Knuth's algorithm first creates the
element lists for the FOs, and then from those lists it calculates
the most favorable break-positions. Subsequently, it adds the areas
based on those breaks to the block-area, right?
Now, what I mean:
If the element-lists for the trailing spaces(*) are modeled
appropriately, and we add a forced break (infinite penalty) for the
end-of-block, then the algorithm will always create one final pseudo-
line-break(**) where those spaces are dissolved if present, just as
they would be when it were the first line. The generated pseudo-line
(s) will have no content at all. Maybe a minor tweak needed in
LineArea to return zero BPD when it has no child-areas, and there we
go... In Block.addChildArea, we can then test for zero-BPD line-areas
to keep them from effectively being added to the block.
Something like that? Or am I still missing important implications?
(*) this made me wonder BTW in how many percent of the cases an
fo:inline with a trailing space would actually end an fo:block.
Anyone care to make an educated guess?
(**) more than one in the very exceptional case where the trailing
spaces would cause a line-break themselves, i.e. if there is just
enough IPD left for one space, and we have more than one... but that
would mean nested-nested-...-nested trailing fo:inlines, or one
fo:inline with lots of non-collapsed spaces.
<snip />
> That is not the point at all. The previous algorithm was defective in
> the sense of not dealing with whitespace around markers and possibly
> other fo's with text content.
OK, so it is an improvement after all.
Phew, <wipes forehead />, I almost thought I had become utterly
useless... :-)
> The task at hand was to extend the whitespace handling to other fo's
> which were previously omitted, e.g. markers. Your change does that
> however, it does not preserve the existing functionality. Therefore
> its
> progress in one sense and regression in another. What I am asking you
> to do is to look for a solution were we don't have any regressions and
> still get the whitespace handling applied to other fos.
See my above description: it can be done with much less effort IIC,
both efficiency- and code-wise, if this particular step is left to
the layout algorithm.
BTW: there is another gap that isn't completely covered by my
alterations. Markers are always white-space-treated as inlines, which
would lead to incorrect results if a marker is retrieved in a context
like
<fo:block><fo:retrieve-marker .../></fo:block>
As I see it, this means that something like what I described above
will need to be considered for this case as well. If the marker is
retrieved as a child of an fo:inline, the currently produced result
will be correct.
Since authors are allowed to have static-contents that retrieve the
same marker twice, once as child of a block and another as a child of
an inline, we can't possibly decide at FOTree stage if these spaces
may be removed.
> BTW, if you had mentioned the regression in your patch description I
> would have raised my objections at that time. You only mentioned it
> after you applied the patch.
True enough, I hadn't considered that. No harm intended and none
taken, I hope...
Anyway, up to here, this has yet again been a very stimulating
discussion. Thanks for insisting on my reconsidering and rephrasing
of ideas. At the start, I only *sensed* it was possible and desirable
to move this to layout. Now I'm certain that it is not only possible,
but also mandatory to do so, if we want to cover virtually all cases.
Cheers,
Andreas
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Manuel Mall <mm...@arcus.com.au>.
On Mon, 2 Jan 2006 12:56 am, Andreas L Delmelle wrote:
> On Jan 1, 2006, at 17:15, Manuel Mall wrote:
> > The Knuth algorithm (read the paper) deals only with box/pen/glue
> > for the purpose of breaking lines and if it breaks a line it takes
> > certain actions with respect to discarding pen/glue elements
> > directly following
> > the break it created. If it doesn't create a line break it leaves
> > everything as it is. This means everything at the beginning and end
> > of a paragraph is left untouched. line-feed-treatment at the
> > beginning and
> > end of a paragraph is not influenced by the Knuth algorithm and
> > therefore cannot be controlled by whatever sequences we generate.
>
> Ahem... I do get your point, but the fact of the matter remains that
> the trailing spaces should be removed for the reason that they would
> end up at the end of a *line-area*, not because they end up at the
> end of the *paragraph*.
>
> I have no trouble grasping the idea that the Knuth algorithm only
> creates effective breaks in intermediate positions, and takes certain
> actions for those breaks. Ok, so that means the start- or end-of-
> paragraph line-break is not created by this algorithm in itself, and
> remains out-of-scope here. Would it not be a much easier and much
> more straightforward solution to have every paragraph end with an
> infinitely low penalty, so that the algorithm eventually treats
> trailing spaces in the last line-area just the same as it would for
> 'normal' line-breaks?
No, leading and trailing paragraph spaces must be removed BEFORE
linebreaking, that is before we get into the Knuth stuff otherwise they
may be incorrectly considered as part of the linebreaking line length
and adjustment calculations. Therefore when this was done during
refinement at the block level it was just the right place IMO.
Obviously spaces around formatter generated linebreaks must be dealt
with during linebreaking.
>
> > We can control line-feed-treatment at Knuth generated breaks by
> > constructing the proper sequences which we will do eventually. But
> > start/end paragraph is outside of that which is why I am keen to
> > push it into the FO refinement stage (as it used to be).
>
> As I said, it's all the same to me. If you (and a few others, of
> course) think we were better off before I committed my changes, then
> by all means, go ahead and revert... I did my homework, and posted it
> as a patch for review first. As I recall, only Finn had anything to
> add, and his comment was taken into account. The rest of you remained
> silent, which I consider to be at least a '+0' (= go ahead if you
> want to, but don't expect any assistance from us, because we already
> have our hands full).
>
That is not the point at all. The previous algorithm was defective in
the sense of not dealing with whitespace around markers and possibly
other fo's with text content.
The task at hand was to extend the whitespace handling to other fo's
which were previously omitted, e.g. markers. Your change does that
however, it does not preserve the existing functionality. Therefore its
progress in one sense and regression in another. What I am asking you
to do is to look for a solution were we don't have any regressions and
still get the whitespace handling applied to other fos.
BTW, if you had mentioned the regression in your patch description I
would have raised my objections at that time. You only mentioned it
after you applied the patch.
>
> Cheers,
>
> Andreas
Regards
Manuel
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jan 1, 2006, at 17:15, Manuel Mall wrote:
> The Knuth algorithm (read the paper) deals only with box/pen/glue for
> the purpose of breaking lines and if it breaks a line it takes certain
> actions with respect to discarding pen/glue elements directly
> following
> the break it created. If it doesn't create a line break it leaves
> everything as it is. This means everything at the beginning and end of
> a paragraph is left untouched. line-feed-treatment at the beginning
> and
> end of a paragraph is not influenced by the Knuth algorithm and
> therefore cannot be controlled by whatever sequences we generate.
Ahem... I do get your point, but the fact of the matter remains that
the trailing spaces should be removed for the reason that they would
end up at the end of a *line-area*, not because they end up at the
end of the *paragraph*.
I have no trouble grasping the idea that the Knuth algorithm only
creates effective breaks in intermediate positions, and takes certain
actions for those breaks. Ok, so that means the start- or end-of-
paragraph line-break is not created by this algorithm in itself, and
remains out-of-scope here. Would it not be a much easier and much
more straightforward solution to have every paragraph end with an
infinitely low penalty, so that the algorithm eventually treats
trailing spaces in the last line-area just the same as it would for
'normal' line-breaks?
> We can control line-feed-treatment at Knuth generated breaks by
> constructing the proper sequences which we will do eventually. But
> start/end paragraph is outside of that which is why I am keen to push
> it into the FO refinement stage (as it used to be).
As I said, it's all the same to me. If you (and a few others, of
course) think we were better off before I committed my changes, then
by all means, go ahead and revert... I did my homework, and posted it
as a patch for review first. As I recall, only Finn had anything to
add, and his comment was taken into account. The rest of you remained
silent, which I consider to be at least a '+0' (= go ahead if you
want to, but don't expect any assistance from us, because we already
have our hands full).
Cheers,
Andreas
Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Posted by Manuel Mall <mm...@arcus.com.au>.
On Sun, 1 Jan 2006 10:22 pm, Andreas L Delmelle wrote:
> On Dec 31, 2005, at 17:02, Andreas L Delmelle wrote:
>
> (been pondering a bit more over this, and...)
>
> > Et voilĂ , that seems to be where the real *flaw* is located, if you
> > ask me. It should care about glues at the beginning of a line --
> > which it seems to handle perfectly ATM--
>
> In fact, this may currently be handled 'too perfectly'. One of the
> testcases --block_white-space_2.xml-- fails because a leading non-
> breaking space is removed, contrary to the expectation.
>
> Don't get me wrong. I still think that it is unnecessary to remove
> the mentioned trailing white-space for trailing nested inlines in a
> paragraph in the FOTree.
>
> Only, I think I'm beginning to see what is meant by this paradox:
> > Besides that, I get the impression you're somewhat contradicting
> > yourself here:
> > - in the comment on the failing testcase you noted that 'These
> > tests fail because the Knuth element sequences for consecutive
> > whitespace are not correct.'
> > - and now you're saying that it's not a matter of generating the
> > correct element sequences
>
You still don't seem to quite get my point.
The Knuth algorithm (read the paper) deals only with box/pen/glue for
the purpose of breaking lines and if it breaks a line it takes certain
actions with respect to discarding pen/glue elements directly following
the break it created. If it doesn't create a line break it leaves
everything as it is. This means everything at the beginning and end of
a paragraph is left untouched. line-feed-treatment at the beginning and
end of a paragraph is not influenced by the Knuth algorithm and
therefore cannot be controlled by whatever sequences we generate.
We can control line-feed-treatment at Knuth generated breaks by
constructing the proper sequences which we will do eventually. But
start/end paragraph is outside of that which is why I am keen to push
it into the FO refinement stage (as it used to be).
>
> Would this be a correct assessment?
>
>
> Cheers,
>
> Andreas
Manuel