You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Brian Minchau (JIRA)" <xa...@xml.apache.org> on 2005/06/03 23:57:55 UTC

[jira] Commented: (XALANJ-2109) \r\n in an HTML attribute is incorrectly output as \r\r\n

    [ http://issues.apache.org/jira/browse/XALANJ-2109?page=comments#action_12312580 ] 

Brian Minchau commented on XALANJ-2109:
---------------------------------------


Although this issue is resolved, it may be of interest to know that XALANJ-2093, which will be in the Xalan-J 2.7 release, will allow you to specify what \n is normalized to on output. For example, you can pick <xsl:output  xalan:line-separator="&#10;" > and it won't use the runtime library value for the line separator.

>  \r\n  in an HTML attribute is incorrectly output as \r\r\n
> -----------------------------------------------------------
>
>          Key: XALANJ-2109
>          URL: http://issues.apache.org/jira/browse/XALANJ-2109
>      Project: XalanJ2
>         Type: Bug
>   Components: Serialization
>     Reporter: Brian Minchau
>     Assignee: Brian Minchau
>      Fix For: CurrentCVS
>  Attachments: ToHTMLStream.2109.patch.txt
>
> The serializer assumes that a single \n should be expanded to the systems end of line sequence. This is OK for text nodes, but not correct for HTML attributes. The reasons follow.
> Input XML document:
> <?xml version="1.0"?>
> <input 
>   data="xxx&#13;&#10;yyy" 
>   type="hidden" 
>   name="data.stuff" />
> Stylesheet:
> <?xml version="1.0"?> 
> <xsl:stylesheet 
>   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>   version="1.0" >
>   <xsl:output method="html" />
>   <xsl:template match="input|br">
>     <xsl:copy>
>       <xsl:copy-of select="@*"/>
>       <xsl:apply-templates/>
>     </xsl:copy>
>   </xsl:template>
> </xsl:stylesheet>
> There are Four stages of processing:
> A) what is in the input XML document
> B) what is presented to Xalan by the XML parser
> C) what is written out by the Xalan processor
> D) what is interpreted by a browser or user agent.
> The output produced by stage C) by Xalan is this:
> <input data="xxx
> yyy" type="hidden" name="data.stuff">
> To indicate that more clearly the value for the attribute 'data' 
> written out on windows is this:
> "xxx\r\r\nyyy"
> and on other operating systems the value written out is this:
> "xxx\r\nyyy"
> Current processing of the attribute by Xalan is this:
>  - write out the \r as is
>  - consider the \n a normalized end of line sequence produced by
>    the XML parser from stage A) and it write it out
>    in stage C) as the system end of line 
>    sequence, either \r\n or just \n depending on the operation system.
> The HTML recommendation, at 
>   http://www.w3.org/TR/html401/types.html#h-6.2
> says this about stage D) :
> <<
> User agents should interpret attribute values as follows: 
>  1. Replace character entities with characters, 
>  2. Ignore line feeds, 
>  3. Replace each carriage return or tab with a single space. 
> >>
> Xalan's output on Windows OS by stage C) of "xxx\r\r\nyyy" would be interpreted
> as "xxx  yyy" by a browser at stage D). Bullet 2. means that the
> browser would ignore the \n, and bullet 3 means that it would
> interpret \r\r as two spaces.
> Xalan's output from stage C) on other operating systems
> of "xxx\r\nyyy" would be interpreted as "xxx yyy" by a browser at stage D). 
> This is one less space between "xxx" and "yyy"
> Since the browser interpretation differs depending on which OS
> we are running on this is a bug, we shouldn't normalize
> the \n in the attribute value to the system end of line sequence.
> We should leave it alone, thus producing this output by stage D) on all operating systems:
> "xxx\r\nyyy"
> I ran this through Saxon 6.5.3 and its output was:
> <input data="xxx&#xA;yyy" type="hidden" name="data.stuff">
> When a browser interprets Saxon's output it would apply 
> bullet 1 and interpret a single newline character between "xxx" and "yyy".
> It is not clear if the bullets 1,2,3 quoted from the HTML recommendation apply in sequence, or if just one of them applies. If just one of them applies the browser might interpret Saxons 'data' attribute value as "xxx\nyyy". On the other hand if one applies bullet 1. followed by bullet 2. then Saxon's 'data' atribute value is interpreted as "xxxyyy". Either way Xalan's output is different than Saxon's in a way that is significant to a browser or user agent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org