You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2005/11/01 09:25:31 UTC

Re: White space handling Wiki page

On Oct 31, 2005, at 22:18, Andreas L Delmelle wrote:

> On Oct 27, 2005, at 06:29, Manuel Mall wrote:
>> Actually something like:
>> <fo:block background-color="yellow">word1<fo:character
>> character="&#10;"/><fo:character character=
>> " "/>word2<fo:character character=" "/>word3<fo:character
>> character="&#10;"/></fo:block>
>> currently causes an exception!
>>
>
>
> The problem can be solved by a slight modification to OneCharIterator:
> * add a constructor with Character parameter (and member)
> * add a remove() implementation which makes Character's parent  
> remove it from its list of child nodes
>
> Tested locally (very quickly), and seems to work nicely. If I get  
> the chance to commit it in the next few days, I'll do so myself,  
> but if you want to have a go, it's a pretty easy fix (adds up to  
> about 10-15 LOC incl. javadocs :-))

Oops, been too quick. From an UnsupportedOperationException to a  
ConcurrentModificationException...
The trick seems to be to introduce a small boolean 'discard' switch  
to the Character object, flip this upon calling OCIter.remove(), and  
have the Block/Inline later remove any of its characters marked as  
discardable, but do this (of course) only after the  
RecursiveCharIterator has finished --to avoid the childNodes list  
from being altered while it's being iterated over...

Other option: store a list of the discardable space fo:characters at  
Block or Inline level, instead of marking the Character itself as  
such...

A bit more than 15 LOC, but still quite doable.

Cheers,

Andreas


Re: White space handling Wiki page

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Nov 1, 2005, at 10:04, Manuel Mall wrote:

>
> I am sure it is doable - but is it worth it at this stage? Possibly
> after a better understanding of the white-space handling issues that
> whole current system needs revision? One problem with the current char
> iterator is that it iterates over inline boundaries which causes white
> space to be collapsed across those which according to the  
> clarification
> of the WG is incorrect. IMO to implement the refinement step of the
> white space handling (which currently happens in the flow.Block  
> object)
> we need an iterator which goes through all characters but indicates fo
> boundaries (not including fo:characters) so we can do:
> a) linefeed treatment across all characters;
> b) white space collapse across each consecutive section of
> implicit/explicit fo:characters, i.e. delimited by the start/end of
> fo's;
> c 1) white-space-treatment from the start of the fo:block to the first
> non white-space character;
> The iterator must also be able to either operate backwards or be  
> able to
> be reset to a particular position (last non white space character) so
> we can do:
> c 2)  white-space-treatment from the end of the fo:block backwards to
> the first non white-space character
>
> It must also support character deletions and character substitutions.
>
> Does that make sense?

Very much. Precisely with that in mind, I've also been contemplating  
moving part of the whitespace-handling to inline-level. This would  
keep the nested inlines separated from the Block's own direct FOText  
descendants (and at the same time, in combination with the  
modification I already described, this would provide us with an  
opportunity to remove fo:characters from within the nested inlines -- 
which would become quite a pain if this removal is deferred to block- 
level)

So the RecursiveCharIterator should only create Iterators over  
regular FOText or fo:characters that are direct descendants of the  
Block/Inline. FOText of nested FObjs should be left alone, since the  
whitespace will already be collapsed. IOW, it should stop being -- 
recursive?

Currently, whitespace handling is triggered from the moment a Block  
encounters a child node that isn't FOText nor generates inline areas.  
At the basis this seems OK, the only difference I'd propose is that  
inlines do their own whitespace handling, so that *if* whitespace  
needs to be collapsed across fo boundaries --maybe there are  
cases?--, the block-level only needs to look at the first and last  
characters in an inline's text.


Cheers,

Andreas


Re: White space handling Wiki page

Posted by Manuel Mall <mm...@arcus.com.au>.
On Tue, 1 Nov 2005 04:25 pm, Andreas L Delmelle wrote:
> On Oct 31, 2005, at 22:18, Andreas L Delmelle wrote:
> > On Oct 27, 2005, at 06:29, Manuel Mall wrote:
> >> Actually something like:
> >> <fo:block background-color="yellow">word1<fo:character
> >> character="&#10;"/><fo:character character=
> >> " "/>word2<fo:character character=" "/>word3<fo:character
> >> character="&#10;"/></fo:block>
> >> currently causes an exception!
> >
> > The problem can be solved by a slight modification to
> > OneCharIterator: * add a constructor with Character parameter (and
> > member)
> > * add a remove() implementation which makes Character's parent
> > remove it from its list of child nodes
> >
> > Tested locally (very quickly), and seems to work nicely. If I get
> > the chance to commit it in the next few days, I'll do so myself,
> > but if you want to have a go, it's a pretty easy fix (adds up to
> > about 10-15 LOC incl. javadocs :-))
>
> Oops, been too quick. From an UnsupportedOperationException to a
> ConcurrentModificationException...
> The trick seems to be to introduce a small boolean 'discard' switch
> to the Character object, flip this upon calling OCIter.remove(), and
> have the Block/Inline later remove any of its characters marked as
> discardable, but do this (of course) only after the
> RecursiveCharIterator has finished --to avoid the childNodes list
> from being altered while it's being iterated over...
>
> Other option: store a list of the discardable space fo:characters at
> Block or Inline level, instead of marking the Character itself as
> such...
>
> A bit more than 15 LOC, but still quite doable.

I am sure it is doable - but is it worth it at this stage? Possibly 
after a better understanding of the white-space handling issues that 
whole current system needs revision? One problem with the current char 
iterator is that it iterates over inline boundaries which causes white 
space to be collapsed across those which according to the clarification 
of the WG is incorrect. IMO to implement the refinement step of the 
white space handling (which currently happens in the flow.Block object) 
we need an iterator which goes through all characters but indicates fo 
boundaries (not including fo:characters) so we can do:
a) linefeed treatment across all characters;
b) white space collapse across each consecutive section of 
implicit/explicit fo:characters, i.e. delimited by the start/end of 
fo's;
c 1) white-space-treatment from the start of the fo:block to the first 
non white-space character;
The iterator must also be able to either operate backwards or be able to 
be reset to a particular position (last non white space character) so 
we can do:
c 2)  white-space-treatment from the end of the fo:block backwards to 
the first non white-space character

It must also support character deletions and character substitutions.

Does that make sense?

>
> Cheers,
>
> Andreas

Manuel