You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Scott Boag/CAM/Lotus <Sc...@lotus.com> on 2000/02/03 23:46:22 UTC

That unwanted white space in HTML output

Does anyone have opinions about changing Xalan's behavior re this note.  I
have specifically followed Clark's whitespace rules, primarily for the
purposes of file comparisons of Xalan with XT, frankly.

This will be moot once the Xerces Serializers become the default for Xalan,
since I believe the Serializers already follow the convention outlined
below.

-scott


----- Forwarded by Scott Boag/CAM/Lotus on 02/03/00 05:43 PM -----
                                                                                                                           
                    Mike Brown                                                                                             
                    <mbrown@corp.webb.net        To:     "'xsl-list@mulberrytech.com'" <xs...@mulberrytech.com>         
                    >                            cc:     (bcc: Scott Boag/CAM/Lotus)                                       
                    Sent by:                     Subject:     That unwanted white space in HTML output                     
                    owner-xsl-list@mulber                                                                                  
                    rytech.com                                                                                             
                                                                                                                           
                                                                                                                           
                    02/03/00 05:08 PM                                                                                      
                    Please respond to                                                                                      
                    xsl-list                                                                                               
                                                                                                                           
                                                                                                                           




Warren Hedley wrote:
> The whitespace between <a> and <img> elements is a fairly
> common problem [...] can anyone suggest any other element
> types where this behaviour might be necessary?

Yes, all "inline" elements. These are enumerated in the HTML 4 DTDs as the
following:

(strict)
TT | I | B | BIG | SMALL | EM | STRONG | DFN | CODE | SAMP | KBD | VAR |
CITE | ABBR | ACRONYM | A | IMG | OBJECT | BR | SCRIPT | MAP | Q | SUB |
SUP
| SPAN | BDO | INPUT | SELECT | TEXTAREA | LABEL | BUTTON

(transitional)
TT | I | B | U | S | STRIKE | BIG | SMALL | EM | STRONG | DFN | CODE | SAMP
| KBD | VAR | CITE | ABBR | ACRONYM | A | IMG | APPLET | OBJECT | FONT |
BASEFONT | BR | SCRIPT | MAP | Q | SUB | SUP | SPAN | BDO | IFRAME | INPUT
|
SELECT | TEXTAREA | LABEL | BUTTON

I believe a clause should be included in a future version of the XSLT spec:
"When emitting a result tree as HTML, whitespace should never be added
inside inline elements."

Example:

What would normally be emitted as unindented XML like this:
<p><a href="foo"><img src="bar"/></a><br/>some text</p>

...could be emitted as indented HTML like this:
<p>
<a href="foo"><img src="bar"/></a><br/>some text
</p>


The reason why this rule is needed is because if whitespace is added, it
and
any adjacent whitespace is interpreted as a single "word separator"
relative
to adjacent text. The browser is supposed to render this separator in a
manner apporpriate to the language script being used, which isn't something
that is always predictable. In the Latin-based languages, the word
separator
is a breaking space.

In the case of inline images, applets and objects, you end up with the
image, applet or object being equivalent to some text, with the bottom edge
aligned along the baseline of adjacent text, as per the spec. This is
normally desirable behavior, but can be problematic if you are trying to
stack images on top of each other. The space allotted for descending
characters and the space between the bottom edge of descenders and the top
edge of the next row of text is often undesirable.

I made an example of this at http://www.skew.org/xml/misc_demos/whitespace/
and reported it to James Clark as an argument for changing the behavior of
XT's HTMLOutputHandler. He gave me a simple "thanks" for the info, but the
problem has yet to be resolved.

In the mean time, I've modified HTMLOutputHandler.java with an ugly
workaround, removing 'br' from the list of blockElements (which seems to be
an error anyway). This of course doesn't resolve every situation, but was
enough for my purposes, for now.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list