You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by mb...@apache.org on 2019/04/05 00:43:32 UTC

[incubator-daffodil-site] branch asf-site updated: Publishing from 3a4708345a723b264aa6a666c67089c40d4c741a

This is an automated email from the ASF dual-hosted git repository.

mbeckerle pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-daffodil-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 36e5d6c  Publishing from 3a4708345a723b264aa6a666c67089c40d4c741a
36e5d6c is described below

commit 36e5d6c75069467ccfbafaa4d8e4282a037c4af2
Author: Michael Beckerle <mb...@tresys.com>
AuthorDate: Thu Apr 4 20:42:46 2019 -0400

    Publishing from 3a4708345a723b264aa6a666c67089c40d4c741a
---
 content/infoset/index.html | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/content/infoset/index.html b/content/infoset/index.html
index 85634df..6b6c392 100644
--- a/content/infoset/index.html
+++ b/content/infoset/index.html
@@ -534,16 +534,16 @@ but extended to handle all the XML 1.0 illegal characters including those
 with 16-bit codepoint values. This mapping is used bi-directionally, that is,
 illegal characters are replaced by their legal counterparts when parsing, and
 the reverse transformation is performed when unparsing, thereby allowing the
-creation of data containing the XML illegal characters from legal XML
+creation of data streams containing the XML illegal characters from legal XML
 documents that contain only the mapped PUA corresponding characters.</p>
 
 <p>These are the legal XML characters (for XML v1.0):</p>
 
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#x0 | #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] | #xD (treated specially)
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 
 </code></pre></div></div>
-
-<p>Illegal characters from <code class="highlighter-rouge">#x00</code> to <code class="highlighter-rouge">#x1F</code> are mapped to the PUA
-by adding <code class="highlighter-rouge">#xE000</code> to their character code.</p>
+<p>All other characters are illegal.
+Illegal characters from <code class="highlighter-rouge">#x00</code> to <code class="highlighter-rouge">#x1F</code> are mapped to the PUA
+by adding <code class="highlighter-rouge">#xE000</code> to their character code. Hence, the NUL (#x0) character code becomes #xE000.</p>
 
 <p>Illegal characters from <code class="highlighter-rouge">#xD800</code> to <code class="highlighter-rouge">#xDFFF</code> are mapped to the PUA by adding
 <code class="highlighter-rouge">#x1000</code> to their character code. So <code class="highlighter-rouge">#xD800</code> maps to <code class="highlighter-rouge">#xE800</code>, and
@@ -553,16 +553,18 @@ by adding <code class="highlighter-rouge">#xE000</code> to their character code.
 subtracting <code class="highlighter-rouge">#x0F00</code> from their character code, so to characters <code class="highlighter-rouge">#xF0FE</code>
 and <code class="highlighter-rouge">#xF0FF</code>.</p>
 
-<p>Character <code class="highlighter-rouge">#xD</code> (Carriage Return or CR) is mapped to <code class="highlighter-rouge">#xA</code> (Line Feed, or
+<p>The legal character <code class="highlighter-rouge">#xD</code> (Carriage Return or CR) is mapped to <code class="highlighter-rouge">#xA</code> (Line Feed, or
 LF). The CR character is allowed in the textual representation of XML
 documents, but is always converted to LF in the XML Infoset. That is, it is
 read by XML processors, but CRLF is converted to just LF, and CR alone is
 converted to LF. Daffodil is in a sense a different ‘reader’ of data into the
 XML infoset, so to be consistent with XML we map CR and CRLF to LF.</p>
 
-<p>It is a processing error when parsing if any DFDL infoset string contains
+<p>It is a processing error when parsing if the data-stream contains
 characters in the parts of the PUA used by this mapping for illegal XML
-codepoints.</p>
+codepoints. When unparsing, the characters such as #xE000 found in the infoset string values are mapped back to the corresponding illegal character code points (#xE000 becomes #x0, aka NUL).</p>
+
+<p>The XML for an infoset can embed the #xE000 character or any of the other “illegal” characters mapped into the PUA conveniently by use of XSD numeric character entities such as “”. This is turned into the #xE000 code point when the XML document is loaded. Daffodil will then map this when unparsing, to #x0 (aka NUL).</p>
 
 <p>It is a processing error if any DFDL infoset string character is created with a
 character code greater than <code class="highlighter-rouge">#x10FFFF</code>.</p>