You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2005/11/01 09:25:31 UTC
Re: White space handling Wiki page
On Oct 31, 2005, at 22:18, Andreas L Delmelle wrote:
> On Oct 27, 2005, at 06:29, Manuel Mall wrote:
>> Actually something like:
>> <fo:block background-color="yellow">word1<fo:character
>> character=" "/><fo:character character=
>> " "/>word2<fo:character character=" "/>word3<fo:character
>> character=" "/></fo:block>
>> currently causes an exception!
>>
>
>
> The problem can be solved by a slight modification to OneCharIterator:
> * add a constructor with Character parameter (and member)
> * add a remove() implementation which makes Character's parent
> remove it from its list of child nodes
>
> Tested locally (very quickly), and seems to work nicely. If I get
> the chance to commit it in the next few days, I'll do so myself,
> but if you want to have a go, it's a pretty easy fix (adds up to
> about 10-15 LOC incl. javadocs :-))
Oops, been too quick. From an UnsupportedOperationException to a
ConcurrentModificationException...
The trick seems to be to introduce a small boolean 'discard' switch
to the Character object, flip this upon calling OCIter.remove(), and
have the Block/Inline later remove any of its characters marked as
discardable, but do this (of course) only after the
RecursiveCharIterator has finished --to avoid the childNodes list
from being altered while it's being iterated over...
Other option: store a list of the discardable space fo:characters at
Block or Inline level, instead of marking the Character itself as
such...
A bit more than 15 LOC, but still quite doable.
Cheers,
Andreas
Re: White space handling Wiki page
Posted by Andreas L Delmelle <a_...@pandora.be>.
On Nov 1, 2005, at 10:04, Manuel Mall wrote:
>
> I am sure it is doable - but is it worth it at this stage? Possibly
> after a better understanding of the white-space handling issues that
> whole current system needs revision? One problem with the current char
> iterator is that it iterates over inline boundaries which causes white
> space to be collapsed across those which according to the
> clarification
> of the WG is incorrect. IMO to implement the refinement step of the
> white space handling (which currently happens in the flow.Block
> object)
> we need an iterator which goes through all characters but indicates fo
> boundaries (not including fo:characters) so we can do:
> a) linefeed treatment across all characters;
> b) white space collapse across each consecutive section of
> implicit/explicit fo:characters, i.e. delimited by the start/end of
> fo's;
> c 1) white-space-treatment from the start of the fo:block to the first
> non white-space character;
> The iterator must also be able to either operate backwards or be
> able to
> be reset to a particular position (last non white space character) so
> we can do:
> c 2) white-space-treatment from the end of the fo:block backwards to
> the first non white-space character
>
> It must also support character deletions and character substitutions.
>
> Does that make sense?
Very much. Precisely with that in mind, I've also been contemplating
moving part of the whitespace-handling to inline-level. This would
keep the nested inlines separated from the Block's own direct FOText
descendants (and at the same time, in combination with the
modification I already described, this would provide us with an
opportunity to remove fo:characters from within the nested inlines --
which would become quite a pain if this removal is deferred to block-
level)
So the RecursiveCharIterator should only create Iterators over
regular FOText or fo:characters that are direct descendants of the
Block/Inline. FOText of nested FObjs should be left alone, since the
whitespace will already be collapsed. IOW, it should stop being --
recursive?
Currently, whitespace handling is triggered from the moment a Block
encounters a child node that isn't FOText nor generates inline areas.
At the basis this seems OK, the only difference I'd propose is that
inlines do their own whitespace handling, so that *if* whitespace
needs to be collapsed across fo boundaries --maybe there are
cases?--, the block-level only needs to look at the first and last
characters in an inline's text.
Cheers,
Andreas
Re: White space handling Wiki page
Posted by Manuel Mall <mm...@arcus.com.au>.
On Tue, 1 Nov 2005 04:25 pm, Andreas L Delmelle wrote:
> On Oct 31, 2005, at 22:18, Andreas L Delmelle wrote:
> > On Oct 27, 2005, at 06:29, Manuel Mall wrote:
> >> Actually something like:
> >> <fo:block background-color="yellow">word1<fo:character
> >> character=" "/><fo:character character=
> >> " "/>word2<fo:character character=" "/>word3<fo:character
> >> character=" "/></fo:block>
> >> currently causes an exception!
> >
> > The problem can be solved by a slight modification to
> > OneCharIterator: * add a constructor with Character parameter (and
> > member)
> > * add a remove() implementation which makes Character's parent
> > remove it from its list of child nodes
> >
> > Tested locally (very quickly), and seems to work nicely. If I get
> > the chance to commit it in the next few days, I'll do so myself,
> > but if you want to have a go, it's a pretty easy fix (adds up to
> > about 10-15 LOC incl. javadocs :-))
>
> Oops, been too quick. From an UnsupportedOperationException to a
> ConcurrentModificationException...
> The trick seems to be to introduce a small boolean 'discard' switch
> to the Character object, flip this upon calling OCIter.remove(), and
> have the Block/Inline later remove any of its characters marked as
> discardable, but do this (of course) only after the
> RecursiveCharIterator has finished --to avoid the childNodes list
> from being altered while it's being iterated over...
>
> Other option: store a list of the discardable space fo:characters at
> Block or Inline level, instead of marking the Character itself as
> such...
>
> A bit more than 15 LOC, but still quite doable.
I am sure it is doable - but is it worth it at this stage? Possibly
after a better understanding of the white-space handling issues that
whole current system needs revision? One problem with the current char
iterator is that it iterates over inline boundaries which causes white
space to be collapsed across those which according to the clarification
of the WG is incorrect. IMO to implement the refinement step of the
white space handling (which currently happens in the flow.Block object)
we need an iterator which goes through all characters but indicates fo
boundaries (not including fo:characters) so we can do:
a) linefeed treatment across all characters;
b) white space collapse across each consecutive section of
implicit/explicit fo:characters, i.e. delimited by the start/end of
fo's;
c 1) white-space-treatment from the start of the fo:block to the first
non white-space character;
The iterator must also be able to either operate backwards or be able to
be reset to a particular position (last non white space character) so
we can do:
c 2) white-space-treatment from the end of the fo:block backwards to
the first non white-space character
It must also support character deletions and character substitutions.
Does that make sense?
>
> Cheers,
>
> Andreas
Manuel