You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Dave Thompson (Jira)" <ji...@apache.org> on 2022/08/15 15:06:00 UTC

[jira] [Closed] (DAFFODIL-2708) XML String feature in XML Text Infoset Inputter/Outputter

     [ https://issues.apache.org/jira/browse/DAFFODIL-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Thompson closed DAFFODIL-2708.
-----------------------------------

Verified the specified commit (commit 3b213ce30b1974ecd9fc2260e4f081240da89874) is included in the latest pull from the daffodil repository.

Verified, via review, changes identified in commit comment were implemented.

Verified the all affected daffodil subproject sbt test suites execute successfully including the added test.

Verified the nightly test schemas compile and save successfully.

Verified the nightly test suite executes successfully.

> XML String feature in XML Text Infoset Inputter/Outputter
> ---------------------------------------------------------
>
>                 Key: DAFFODIL-2708
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2708
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: Back End
>    Affects Versions: 3.3.0
>            Reporter: Mike Beckerle
>            Priority: Critical
>             Fix For: 3.4.0
>
>
> Several users need a specific feature.
> The required feature is needed for XML output where a string that is known to itself be a string of XML text can be embedded in the XML output from parsing without escaping it.
> Symmetrically, for unparsing, a string element identified as XML text should result in a series of XML "events" being absorbed and converted to a string which is the ultimate value of the string element. 
> Note that for any given popular data format (XML, JSON, etc.) where Daffodil supports output of infosets in that representation, the same issue can arise where data contains a string which is already in that representation and users desire for it to be directly embedded, not escaped as a string. 
> For the  purposes of this ticket, let's focus on XML only. Other representations could be added subsequently. 
> Notes:
> 1) on canonicalization - I see know way to avoid strong canonicalization of this XML. If byte for byte preservation of characters such as character entities like &amp;#x20; (a space) or CRLFs is needed, there's just no way to do that(at least that I know of). 
> 2) XML initial slug line/processing instruction - a way to strip this if present in the XML string may be needed. An option to generate it as part of the string when unparsing may also be needed. 
> 3) An ASCII-only or iso-8859-1 only option may be needed where any character outside of those and standard whitespaces is converted to a character entity. 
> 4) This breaks the idea that the DFDL schema IS the XML Schema of the output Infoset XML from parsing. Rather, to create an XML schema for the resulting data, one would have to replace the DFDL element declaration for the string to an appropriate DFDL element reference to the schema of the XML being embedded at that place. 
> It is highly recommended that such a DFDL schema contain comments describing this exact element reference - namespace + name, that the XML String corresponds to. 
> w.r.t. implementation...
> There's some pseudocode for in the "Example Implementation" section of
> the Runtime Properties proposal:
> [https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties#Proposal%3ARuntimeProperties-ExampleImplementation]
> This pseudocode uses the ScalaXML InfosetInputter/Outputter as a base for simplicity, but we should base the actual one on the XMLTextInfosetInputter/Outputter
> since that's what most people use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)