You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Ian Young (JIRA)" <xe...@xml.apache.org> on 2016/04/07 12:06:25 UTC

[jira] [Comment Edited] (XERCESC-2065) Carriage return entities are not handled properly

    [ https://issues.apache.org/jira/browse/XERCESC-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230033#comment-15230033 ] 

Ian Young edited comment on XERCESC-2065 at 4/7/16 10:05 AM:
-------------------------------------------------------------

There are two parts to the original report: the first is the apparent removal of the "text", which I will leave Scott to come back on.

The more important part is that in the output, the #xD is not re-expressed as "& # 1 3 ;" but is left as a "bare" CR character. This is problematic because if that output is then read in *again*, the result will not be identical to the original document as the CR will be normalised as an end-of-line.

The '<' and '&' are obviously being handled correctly.


was (Author: iay):
There are two parts to the original report: the first is the apparent removal of the "text", which I will leave Scott to come back on.

The more important part is that in the output, the #xD is not re-expressed as "& # 1 3 ;" but is left as a "bare" CR character. This is problematic because if that output is then read in *again*, the result will not be identical to the original document as the CR will be normalised as an end-of-line.

> Carriage return entities are not handled properly
> -------------------------------------------------
>
>                 Key: XERCESC-2065
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2065
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: DOM, Non-Validating Parser, SAX/SAX2
>    Affects Versions: 3.1.3
>            Reporter: Scott Cantor
>            Priority: Critical
>
> Documents with CR entities don't seem to round trip properly in the parser if you parse them and then serialize them. It's possible the bug is in the serializer because signed documents don't end up with corrupt signatures, but that may be due to insufficient testing as of yet.
> A simple example:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <foo>
>    text&#13;more&lt;&amp;
> </foo>
> {code}
> Running that through DOMPrint or SAX2Print:
> {code}
> <foo>
> more&lt;&amp;
> </foo>
> {code}
> Notice the CR entity is removed, but also all of the characters immediately in front of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org