You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2008/05/29 17:03:40 UTC

DO NOT REPLY [Bug 45097] New: Questionable white-space-treatment behavior

https://issues.apache.org/bugzilla/show_bug.cgi?id=45097

           Summary: Questionable white-space-treatment behavior
           Product: Fop
           Version: 1.0dev
          Platform: PC
        OS/Version: Windows Vista
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: sgriffin@cerner.com


Created an attachment (id=22029)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22029)
FO markup for example without wrapping block

I've done quite a bit of digging in the web site and the mailing lists to track
down this issue, and there does seem to be some issues around white-space
handling.  What I'm not clear on is whether the already-documented issues
around white-space handling match the behavior I'm seeing, so I'll log this bug
to explain.

Except for a few peculiar cases, the behavior documented in the specification
regarding white-space-treatment does seem to be implemented correctly, but I'm
wondering if the specification is either misinterpreted or wrong to begin with.

I'll attach some FO/PDF examples to explain better, but the bottom-line issue
is that there does not appear to be a way to get spaces at the beginning of a
line to preserve while not creating hanging indents for long blocks that have
formatter-generated line feeds.  The "ignore-if-surrounding-linefeed" property
value obviously solves the hanging indent problem but also prevents spaces at
the start of a block from preserving.

Curiously, if I add inline children to the block the treatment of the
whitespace is different, and further, if I wrap the various blocks with a
single parent block then it changes the whitespace treatment again.  Please see
attachments to see what I'm talking about.

I've tried this in both FOP 0.95beta and FOP Trunk with the same results.

To summarize, I see 3 questionable items:
1. Shouldn't the whitespace_without_wrapping_block.pdf match the
whitespace_with_wrapping_block.pdf?
2. In whitespace_without_wrapping_block.pdf, is the behavior of Example 2
correct where whitespace is preserved inside inline elements even when
whitespace-treatment != "preserve"?
3. In whitespace_without_wrapping_block.pdf, is there a way to get Example 1
behavior and Example 5 behavior with the same block property settings (to
prevent Example 4 behavior)?


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #14 from Andreas L. Delmelle <ad...@apache.org>  2008-11-25 13:12:37 PST ---
(In reply to comment #12)

Sorry to chime in so late...

> Based on my novice analysis, it appears the various KnuthElements provide the
> following purposes:
<snip />

Entirely correct interpretation.

A box is never a break-possibility, unless when preceded by a penalty
indicating one. Glues are always a break-possibility, unless when preceded by a
penalty prohibiting one. That's the general idea. 
If a glue simply appears in between two boxes, then when it is chosen as the
effective break, it dissolves. To generate the effect of preserved spaces or
account for alignment other than "justify", one needs a sequence of those
elements to represent the different effects (break/no-break).
If a glue is followed by a glue, then the latter becomes the more favorable
break. The former could then simply be discarded as a possibility.

> This matches what Andreas shows as the sequence for a preserved space (glue,
> penalty=0, glue, aux. box w=0, penalty=inf, glue).  Is my analysis of each
> KnuthElement and the purpose it serves correct?  I still don't understand how
> it gets the stretch values that it does, 

A point which has been put into question recently: 10008 is exactly the width
of 3 normal spaces, indeed to handle alignment other than "justify", but it has
been proven to have nasty side-effects for long blocks with a relatively small
line-width (multi-column documents), where three spaces would represent a large
portion... The suggestion has been raised to make this a percentage of the
line-width, and IIC, we would also need to take into account the font-size.

On the one hand, the TextLM optimizes the search for linebreaks by merging
words into one single element, not 1 element per character. Even with
hyphenation, we only get one box per hyphenated word-fragment). In terms of the
algorithm, there is no difference between a non-interrupted sequence of
fixed-size boxes or a single box spanning the same width. Most elementary
representation: one box per regular character, one glue for a space. Since we
already know that the letter-boxes will be kept together, we only generate the
one box. If hyphenation is enabled, the word-box is later split into multiple
boxes, with additional flagged penalties in between.
On the other hand, spaces generate multiple elements for one single space
character (and sequences of space-characters are currently not glued together
to a single element, IIC).

Looking closer at the Wiki again, I realize that the sequence for a simple
preserved space looks surprisingly similar to that of a simple break in case of
centered text, apart from the stretch/shrink... and in that case, the trailing
glue there is /meant/ to always be pushed to the next line.

> it seems that a possible fix to this undesirable behavior is to move the break
> possibility from the beginning to the end of the boilerplate sequence. 

Could indeed very well be the solution. If so, the auxiliary box may not even
be needed anymore (?)
I'll look into it. At any rate, it seems like the sequence should be
drastically simplified. Specifying white-space-preserve should not mean that
suddenly, it becomes more attractive to break before the space. The break
should still be strongly discouraged. In the most elementary case, if a glue is
preceded by a box, that condition is easily satisfied.
I think the cases where white-space-preserve really plays a part come down to:
1) white space around preserved linefeeds
2) necessary breaks in the middle of a sequence of non-collapsed white-space

For 1), the solution so far has been to end the current paragraph and start a
new one. One TextLM returns a sequence of element-lists to the LineLM.
If a space were simply represented by a glue, it would dissolve higher up. Due
to the added auxiliary box, at least the auxiliary glue is preserved and does
generate the right effect here.

For 2), I'm thinking of very extreme (and highly unusual) cases, where it
becomes necessary to choose 'a' break, but the choice is between white-space
characters only. If white-space treatment is "preserve",  a portion of
white-space should, strictly speaking, be pushed to the next line, and
influence alignment there... but ideally, if it all fits on one line, that
possibility should obviously be preferred above all else.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097

Glenn Adams <ga...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW

--- Comment #22 from Glenn Adams <ga...@apache.org> 2012-04-11 06:16:14 UTC ---
change status from ASSIGNED to NEW for consistency

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097



--- Comment #20 from Vincent Hennebert <vh...@gmail.com> 2009-08-06 03:22:37 PDT ---
Hi Sean,

I'm afraid this bug doesn't seem to be high on the priority list of any of the
committers. This issue both is non-trivial and affects non-trivial code, so
that would require some involvement to fix it.

You might be happy with the following workaround, though: in a pre-processing
step, replace every space character with a non-breaking space (U+00A0) followed
by a zero-width space (U+200B). That will force the line-breaking algorithm to
break after the space and not before.

HTH,
Vincent

(In reply to comment #19)
> I know it's been a long time, but I was just wondering if anyone was able to
> get anywhere on the last remaining issue described in this bug.  The last
> discussion centered around whether the Knuth sequence for preserved whitespace
> was more complicated than it needed to be and that, possibly, by moving the
> penalty=0 after the glue instead of before the issue would be fixed...assuming
> it doesn't then cause problems with alignment/justification.
> 
> As predicted, one of my clients is finally complaining about the behavior and
> is asking when it will be fixed.  Obviously I can jump in and try to fix myself
> to help in the effort, but I'm guessing that in the time it takes me to learn
> the layout algorithm and Knuth concepts someone with more experience in this
> stuff could have already resolved the issue.  Plus, it sounds like Andreas
> might have already started working on a fix?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #19 from Sean Griffin <sg...@cerner.com>  2009-08-03 11:03:00 PST ---
I know it's been a long time, but I was just wondering if anyone was able to
get anywhere on the last remaining issue described in this bug.  The last
discussion centered around whether the Knuth sequence for preserved whitespace
was more complicated than it needed to be and that, possibly, by moving the
penalty=0 after the glue instead of before the issue would be fixed...assuming
it doesn't then cause problems with alignment/justification.

As predicted, one of my clients is finally complaining about the behavior and
is asking when it will be fixed.  Obviously I can jump in and try to fix myself
to help in the effort, but I'm guessing that in the time it takes me to learn
the layout algorithm and Knuth concepts someone with more experience in this
stuff could have already resolved the issue.  Plus, it sounds like Andreas
might have already started working on a fix?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #12 from Sean Griffin <sg...@cerner.com>  2008-10-26 20:26:01 PST ---
I've dusted off the investigation of this issue in an effort to possibly fix
it.  I've done some research into the Knuth related concepts and read some old
mailing list entries from Simon and Manuele around 2006 trying to get my head
around the layout manager behavior.  I'm still not sure I have it, but I'll
take a crack at it.

Given markup like this:

<fo:block><fo:inline>aaaaaaaaaaaaaaaaaa 
bbbbbbbbbbbbbbbbbbb</fo:inline></fo:block>

* Note, there are 2 spaces (0x20) between the a and b.

The BreakingAlgorithm is given a KnuthSequence constructed with these elements:

[box w=120096, glue w=0 stretch=10008 shrink=0, aux. penalty p=0 w=0, glue
w=3336 stretch=-10008 shrink=0, box w=126768, penalty p=INFINITE w=0, glue w=0
stretch=216000 shrink=0, penalty p=-INFINITE w=0 (forced break)]

Based on my novice analysis, it appears the various KnuthElements provide the
following purposes:

box w=120096 --> string of 'a' characters
glue w=0 stretch=10008 shrink=0 --> handles alignment in case the following
possible break is honored?
aux. penalty p=0 w=0 --> possible break
glue w=3336 stretch=-10008 shrink=0 --> 1 character of whitespace
box w=126768 --> string of 'b' characters

Now, if I add white-space-collapse="false" and white-space-treatment="preserve"
to the block in the markup above I get this KnuthSequence in the
BreakingAlgorithm:

[box w=120096, glue w=0 stretch=10008 shrink=0, aux. penalty p=0 w=0, glue w=0
stretch=-10008 shrink=0, aux. box w=0, aux. penalty p=INFINITE w=0, glue w=3336
stretch=0 shrink=0, glue w=0 stretch=10008 shrink=0, aux. penalty p=0 w=0, glue
w=0 stretch=-10008 shrink=0, aux. box w=0, aux. penalty p=INFINITE w=0, glue
w=3336 stretch=0 shrink=0, box w=126768, penalty p=INFINITE w=0, glue w=0
stretch=216000 shrink=0, penalty p=-INFINITE w=0 (forced break)]

Which I analyze to provide the following purposes:

box w=120096 --> string of 'a' characters

glue w=0 stretch=10008 shrink=0 --> handles alignment in case the following
possible break is honored?
aux. penalty p=0 w=0 + glue w=0 stretch=-10008 shrink=0--> possible break
aux. box w=0 --> prevents whitespace removal
aux. penalty p=INFINITE w=0 --> disables next glue from being break possibility
glue w=3336 stretch=0 shrink=0 --> 1 character of whitespace on line 1

glue w=0 stretch=10008 shrink=0 --> handles alignment in case the following
possible break is honored?
aux. penalty p=0 w=0 + glue w=0 stretch=-10008 shrink=0--> possible break
aux. box w=0 --> prevents whitespace removal
aux. penalty p=INFINITE w=0 --> disables next glue from being break possibility
glue w=3336 stretch=0 shrink=0 --> 1 character of whitespace on line 2

box w=126768 --> string of 'b' characters

This matches what Andreas shows as the sequence for a preserved space (glue,
penalty=0, glue, aux. box w=0, penalty=inf, glue).  Is my analysis of each
KnuthElement and the purpose it serves correct?  I still don't understand how
it gets the stretch values that it does, but ignoring that for now, it seems
that a possible fix to this undesirable behavior is to move the break
possibility from the beginning to the end of the boilerplate sequence. 
Something like this...

aux. penalty p=INFINITE w=0 --> disables next glue from being break possibility
glue w=3336 stretch=0 shrink=0 --> 1st character of whitespace on line 1
glue w=0 stretch=? shrink=0 --> in case the following possible break is
honored?
aux. penalty p=0 w=0 + glue w=0 stretch=? shrink=0--> possible break

It seems this would move the possible break /after/ the glue with w=3336,
thereby keeping the preserved spaces all on the first line and ensuring the
next line after the break starts at the left margin.

This is complicated stuff, so I apologize if I have it all wrong, but I know
it's just a matter of time before my clients complain about this behavior, and
I'd like to help fix the issue if possible before it blows up.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097

--- Comment #21 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:44:24 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #16 from Andreas L. Delmelle <ad...@apache.org>  2008-11-26 11:10:50 PST ---
(In reply to comment #15)

<snip />

Thanks for the rectification, although I wasn't really wrong. At most, not
accurate enough. ;-P

> <snip/>
> > For 2), I'm thinking of very extreme (and highly unusual) cases, where it
> > becomes necessary to choose 'a' break, but the choice is between white-space
> > characters only. If white-space treatment is "preserve",  a portion of
> > white-space should, strictly speaking, be pushed to the next line, and
> > influence alignment there... but ideally, if it all fits on one line, that
> > possibility should obviously be preferred above all else.
> 
> This is probably the biggest issue. This may require to handle a sequence of
> white spaces in its whole instead of each character individually. Sorry, I
> don't have enough energy ATM to look at this issue into more details. Being
> sure that every combination of white space options (white-space-treatment,
> white-space-collapse, linefeed-treatment...) is handled correctly requires an
> extensive study.

I was thinking about introducing a special type of auxiliary glue, with the
possibility to break it in two at a position that is not fixed at the time the
element is generated. (more like a combination of two glues, whose combined
width is known, but not the width of the two individual elements.) 
The LineLM would then treat this as a whole, but not an unbreakable whole, see
how big a portion it can fit on one line, and insert an auxiliary box for the
remaining width (rather than /always/ adding that auxiliary box in the TextLM
when white-space-treatment='preserve').


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #15 from Vincent Hennebert <vh...@gmail.com>  2008-11-26 03:28:18 PST ---
(In reply to comment #14)
> (In reply to comment #12)
> 
> Sorry to chime in so late...
> 
> > Based on my novice analysis, it appears the various KnuthElements provide the
> > following purposes:
> <snip />
> 
> Entirely correct interpretation.
> 
> A box is never a break-possibility, unless when preceded by a penalty
> indicating one. Glues are always a break-possibility, unless when preceded by a
> penalty prohibiting one. That's the general idea. 

I'm afraid this is wrong. You can break only at two places:
- a penalty element whose penalty value is not infinite; then the width of the
penalty must be taken into account.
- a glue element that's immediately preceded by a box; then you discard the
glue's length, shrink and stretch.
Also, when an element is chosen as a breaking point, all the following glue and
penalty elements (if any) are discarded up to the next box element. The
presence of aux. box w=0 at places in the sequence is meant to prevent the
triggering of that mechanism.
See section “Breaking Rules” at the following page:
http://wiki.apache.org/xmlgraphics-fop/KnuthsModel

<snip/>
> For 2), I'm thinking of very extreme (and highly unusual) cases, where it
> becomes necessary to choose 'a' break, but the choice is between white-space
> characters only. If white-space treatment is "preserve",  a portion of
> white-space should, strictly speaking, be pushed to the next line, and
> influence alignment there... but ideally, if it all fits on one line, that
> possibility should obviously be preferred above all else.

This is probably the biggest issue. This may require to handle a sequence of
white spaces in its whole instead of each character individually. Sorry, I
don't have enough energy ATM to look at this issue into more details. Being
sure that every combination of white space options (white-space-treatment,
white-space-collapse, linefeed-treatment...) is handled correctly requires an
extensive study.

Vincent


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #18 from Sean Griffin <sg...@cerner.com>  2008-11-26 12:55:58 PST ---
I think I know enough now to know that this issue is clearly over my head :)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #13 from Manuel Mall <ma...@apache.org>  2008-10-26 21:38:59 PST ---
Sean,

I haven't analysed what you wrote on the Knuth sequences but it may be
worthwhile, unless you have done it already, to compare it against
http://wiki.apache.org/xmlgraphics-fop/LineBreaking.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #17 from Andreas L. Delmelle <ad...@apache.org>  2008-11-26 12:34:10 PST ---
(In reply to comment #16)
> (In reply to comment #15)
> > 
> > This is probably the biggest issue. This may require to handle a sequence of
> > white spaces in its whole instead of each character individually. Sorry, I
> > don't have enough energy ATM to look at this issue into more details. Being
> > sure that every combination of white space options (white-space-treatment,
> > white-space-collapse, linefeed-treatment...) is handled correctly requires an
> > extensive study.
> 
> I was thinking about introducing a special type of auxiliary glue, ...

Or maybe even, we could benefit from a space-resolution pass in line-layout
too. Replace the white-space sequences by one unresolved SpaceElement, and
resolve those in the LineLM, at the end of collecting the inline elements for a
paragraph. 
That would probably be the most comprehensive approach, since it could then be
folded into space-start/space-end resolution (currently non-functional), and it
would make it much easier to detect sequences of consecutive preserved
white-space characters across FO boundaries...


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097

chornsey@hotmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chornsey@hotmail.com

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #11 from Andreas L. Delmelle <ad...@apache.org>  2008-05-31 04:00:19 PST ---

Fix applied to FOP Trunk.
see: http://svn.apache.org/viewvc?rev=661999&view=rev

I'm keeping the issue open FTM, as a reminder for the dubious/inelegant way of
handling preserved white-space around formatter-generated linebreaks. Strictly
speaking not a bug, but I agree with the reporter that the current behavior is
not really what it should be...


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #10 from Andreas L. Delmelle <ad...@apache.org>  2008-05-31 03:58:24 PST ---
Created an attachment (id=22041)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22041)
Result after applying the fix


FWIW: the issues with Example 1 and Example 2 have been fixed in FOP trunk. In
both cases, with or without wrapping block, the result is now as in the
attached PDF.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #1 from Sean Griffin <sg...@cerner.com>  2008-05-29 08:04:11 PST ---
Created an attachment (id=22030)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22030)
PDF example without wrapping block


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097


Sean Griffin <sg...@cerner.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #22032|PDF example without wrapping|PDF example with wrapping
        description|block                       |block




-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #8 from Sean Griffin <sg...@cerner.com>  2008-05-29 17:08:39 PST ---
(In reply to comment #7)
> ...if the line-breaking algorithm
> has a choice of either breaking before or after a space, it will always break
> before it.

Here's where I think our opinions might differ.  I believe it should break
after the space.  I equate the space character's significance in line wrapping
the same as a hyphen.  Technically I know they are quite different, but
functionally, with hyphenation, the break is placed *after* the hyphen not
before, and it seems the same rule should be used with spaces.

To test this theory I first opened up MS Word, turned on the "Show Formatting
Marks" option, and typed a few lines of text that Word wraps on its own.  The
space characters are kept on the line before the wrap as opposed to after.

Since XSL is based off CSS I wondered what happened in internet browsers with
HTML, so I tried the same thing there with a span border on a large block of
text.  Internet Explorer keeps the space at the end of the prior line before
the wrap.  Firefox trims the space similar to the XSL
white-space-treatment="ignore-if-surrounding-linefeed", so it didn't really
apply there.  Unfortunately CSS doesn't have the level of control over
whitespace that XSL does, but it seems the root of the issue isn't
white-space-treatment but how the line-areas are created.

I searched in the spec for quite awhile trying to find where it clearly says
which line-area gets the whitespace in a wrapping block-area, but I couldn't
find it.  The closest I found was this, which unfortunately is a little
ambiguous.

4.7.2 Line-building
The partitioning occurs at legal line-breaks. Specifically, if A is the last
area of Si and B is the first area of Si+1, then the rules of the language,
script and hyphenation constraints ... in effect must permit a line-break
between A and B, within the context of all areas in Si and Si+1.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097


Andreas L. Delmelle <ad...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




--- Comment #4 from Andreas L. Delmelle <ad...@apache.org>  2008-05-29 10:12:40 PST ---
(In reply to comment #0)
<snip />

Thanks for the extensive report, and the testcases!

> To summarize, I see 3 questionable items:
> 1. Shouldn't the whitespace_without_wrapping_block.pdf match the
> whitespace_with_wrapping_block.pdf?

Confirmed. Something is definitely wrong here.

> 2. In whitespace_without_wrapping_block.pdf, is the behavior of Example 2
> correct where whitespace is preserved inside inline elements even when
> whitespace-treatment != "preserve"?

No, this is definitely a bug. The behavior seems to be wrong in both cases. 
The result should be identical to Example 3, only with additional borders.

Example 1 is also incorrect with the wrapping block. The trailing whitespace on
the last line should definitely be preserved.

Technically:
XMLWhiteSpaceHandler does not seem to properly remove the leading/trailing
spaces in the inlines due to an implicit start-of-block/end-of-block in Example
2.
On the one hand the afterLinefeed member is not correctly set when
handleWhiteSpace() is entered the first time for the surrounding block. Easily
fixed.
On the other hand, the pendingInlines are not processed when handleWhiteSpace()
is entered the second time for that block, when the block ends. Slightly more
complicated, but still quite straightforward.

I'll look into a fix for this soon. For now, the correct behavior in
whitespace_without_wrapping_block can be simulated by adding a space character
before and after the inline. In that case, white-space removal is properly
triggered and you get the correct result.

> 3. In whitespace_without_wrapping_block.pdf, is there a way to get Example 1
> behavior and Example 5 behavior with the same block property settings (to
> prevent Example 4 behavior)?
> 

Not sure if I'm following here... Can you clarify? Do you wish to override the
behavior of the first /and/ the last line? I know the XSL-FO specification
defines fo:initial-property-set to affect only the first line-area generated by
an fo:block, but FOP does not implement this yet.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #2 from Sean Griffin <sg...@cerner.com>  2008-05-29 08:04:29 PST ---
Created an attachment (id=22031)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22031)
FO markup for example with wrapping block


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #9 from Andreas L. Delmelle <ad...@apache.org>  2008-05-30 00:26:57 PST ---
(In reply to comment #8)
> (In reply to comment #7)
> > ...if the line-breaking algorithm
> > has a choice of either breaking before or after a space, it will always break
> > before it.
> 
> Here's where I think our opinions might differ.  I believe it should break
> after the space.  

Oh, but the result /is/ correct, strictly speaking. A bit unexpected, maybe,
but definitely not a bug.

The point is well taken though. I've been looking at the related code, and was
beginning to wonder...

This is more meant for the layout-specialists, but using simple
start-alignment, the sequence currently generated for a preserved space
consists of:
- a glue
- a penalty p=0
- a glue
- an auxiliary box w=0
- a penalty p=INFINITE
- a glue

>From a higher-level point of view (the LineLayoutManager) a break on the first
penalty will always be favored over a break on the second, hence why I think
the algorithm chooses to break before the space rather than after. With a
preceding and following word, the above sequence would be enclosed by boxes
corresponding to those words. If the break /has/ to be somewhere in between the
two word-boxes, the preserved space in between always appears at the start of
the next line.

Again, not incorrect, but not the most elegantly looking outcome either.

Actually, it's even slightly worse. Given a sequence of those preserved spaces,
as many as possible will be placed on the line as trailing white-space. That
is: all but the very last one. The zero-penalty appears to be always favored as
the last break in the sequence...

<snip /> 
> I searched in the spec for quite awhile trying to find where it clearly says
> which line-area gets the whitespace in a wrapping block-area, but I couldn't
> find it.  The closest I found was this, which unfortunately is a little
> ambiguous.
> 

Indeed, the rules about where exactly line-breaks are supposed to end up are
not defined by XSL-FO itself. FOP uses Unicode UAX#14
(http://www.unicode.org/reports/tr14/) as reference for the most part, which
does not explicitly forbid a break before a space (although it is discouraged,
IIC)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #3 from Sean Griffin <sg...@cerner.com>  2008-05-29 08:04:45 PST ---
Created an attachment (id=22032)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22032)
PDF example without wrapping block


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097


Sean Griffin <sg...@cerner.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




--- Comment #6 from Sean Griffin <sg...@cerner.com>  2008-05-29 11:55:22 PST ---
I think I might have accidentally marked this bug in NEEDINFO status.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #7 from Andreas L. Delmelle <ad...@apache.org>  2008-05-29 12:26:22 PST ---
(In reply to comment #5)
> 
> Sorry, I probably wasn't very clear. White-space-preserve is set on both
> Example 1 and Example 4.  The behavior of Example 1 was expected but the
> behavior of Example 4 was *not* expected...at first.  The preservation of the
> space after each formatter-generated line feed looks funny and I thought it was
> a bug.  But after thinking about it and reading the white-space handling
> portion of the specification I began to see why it's being done...after all, we
> are saying to preserve all white space and I saw no mention in the spec that
> formatter-generated line feeds should replace surrounding space characters.  So
> I removed white-space-preserve to make Example 4 look like Example 5 (what I
> want), but of course that made Example 1 look like Example 3 (what I didn't
> want).

Yep, either you preserve white-space surrounding linefeeds or you don't.

Note that white-space-treatment (in XSL-FO 1.1 at least) is defined in terms of
preserving/discarding glyph-areas for XML white-space characters during
line-building. So the preservation is not restricted to spaces surrounding
explicit linefeed-characters.

> Technically this "worked" in FOP 0.20.5, but that's not saying much since it
> had other problems related to white-space handling.  Basically, I don't see
> anyone wanting the behavior shown in Example 4 (unless they actually put in a
> text-indent), so I'm questioning if it's truly working as expected.

The fact that Example 4 only has preserved spaces at the start of the lines is
because all the line-breaks are implicit, and if the line-breaking algorithm
has a choice of either breaking before or after a space, it will always break
before it. The result is therefore correct, even though the chances of anyone
seeking that behavior are very slim. Trailing spaces on a line will normally
only appear in case there are also explicit linefeeds or nested blocks, like:

<fo:block white-space-treatment="preserve">
  <fo:block linefeed-treatment="preserve">text   &#x0A;   text</fo:block>
  <fo:block>text   <fo:block />   text</fo:block>
</fo:block>

> To explain a little about what I'm doing, I'm wrapping user-entered text in a
> block, and I want to ensure I keep their formatting.  But I appear to be in a
> catch-22 because if I do that then I also get this "handing indent" problem for
> blocks that have more than 1 line area.

Actually, it seems like you want to do more than just 'keep the original
formatting'. This scenario is very different from the case where one would use
preserved linefeeds combined with wrap-option="no-wrap". It seems like you need
a mixture of both, since you do seem to be needing formatter-generated
linebreaks as well.
Example 1 is also slightly different than Example 5, since it contains nested
blocks. Moving white-space-treatment="preserve" to the inner blocks may be an
option, but I don't know if that fits in your processing logic (?)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 45097] Questionable white-space-treatment behavior

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45097


Sean Griffin <sg...@cerner.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO




--- Comment #5 from Sean Griffin <sg...@cerner.com>  2008-05-29 11:47:27 PST ---
(In reply to comment #4)
> (In reply to comment #0)
> 
> Thanks for the extensive report, and the testcases!

Glad I could help!

> > 3. In whitespace_without_wrapping_block.pdf, is there a way to get Example 1
> > behavior and Example 5 behavior with the same block property settings (to
> > prevent Example 4 behavior)?
> > 
> 
> Not sure if I'm following here... Can you clarify? Do you wish to override the
> behavior of the first /and/ the last line? I know the XSL-FO specification
> defines fo:initial-property-set to affect only the first line-area generated by
> an fo:block, but FOP does not implement this yet.
> 

Sorry, I probably wasn't very clear. White-space-preserve is set on both
Example 1 and Example 4.  The behavior of Example 1 was expected but the
behavior of Example 4 was *not* expected...at first.  The preservation of the
space after each formatter-generated line feed looks funny and I thought it was
a bug.  But after thinking about it and reading the white-space handling
portion of the specification I began to see why it's being done...after all, we
are saying to preserve all white space and I saw no mention in the spec that
formatter-generated line feeds should replace surrounding space characters.  So
I removed white-space-preserve to make Example 4 look like Example 5 (what I
want), but of course that made Example 1 look like Example 3 (what I didn't
want).

Technically this "worked" in FOP 0.20.5, but that's not saying much since it
had other problems related to white-space handling.  Basically, I don't see
anyone wanting the behavior shown in Example 4 (unless they actually put in a
text-indent), so I'm questioning if it's truly working as expected.

To explain a little about what I'm doing, I'm wrapping user-entered text in a
block, and I want to ensure I keep their formatting.  But I appear to be in a
catch-22 because if I do that then I also get this "handing indent" problem for
blocks that have more than 1 line area.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.