You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2022/11/17 16:33:00 UTC

[jira] [Commented] (DAFFODIL-2752) SAX to SAX implementation of stringAsXML feature to enable EXI to still work

    [ https://issues.apache.org/jira/browse/DAFFODIL-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635432#comment-17635432 ] 

Mike Beckerle commented on DAFFODIL-2752:
-----------------------------------------

Per email with [~stevedlawrence] (implemented the first version of stringAsXML) in response to this "crazy idea":

 

I *think* it would actually be fairly straightforward, though there are
definitely some things that make it a bit tricky.

On the parse side, the SAXInfosetOutputter just needs check for the
stringAsXml runtime property, create a new XMLReader with the same
ContentHandler the SAXInfosetOutputter is already writing to, and call
XMLReader.parse(). That will parse the string content, convert it to SAX
events, and send them to the ContentHandler. The ContentHandler won't
know any different.

A couple subtleties that come to mind:

1. We need to create the <stringAsXml xmlns=""> element in SAX to wrap
the XML content, that should be easy enough.

2. We need to convince the XMLReader to not send start/endDocument
events when it reads the stringAsXml content. The first thing that comes
to my mind is to make this XMLReader actually send events to a "proxy"
ContentHandler, which just forwards all events to the real
ContentHandler except for the start/endDocument events. There's probably
other alternatives (e.g. override some XMLReader functions), but it
feels like it should be doable without too much pain.

3. Need to consider how errors would work. But I guess the XMLReader
would throw SAXExceptions, when the SAXInfosetOutputter could just
rethrow, so maybe it's striaghtforward.


I think the unparse side gets a bit trickier however (that always seems
to be the case). The way SAX unparse works is the
DaffodilUnparseContentHandler receives SAX events and converts them to
an array of objects that the SAXInfosetInputter expects and can use to
implement the InfosetInputter API. One complication is that these
objects are pretty specific to the InfosetInputter API. For example,
there's no way to represent mixed content in this objects because it's
not allowed in the infoset, and I'm not sure of a good way to allow that
without making these objects significantly more complex.

One way around this would be to have the DaffodilUnparseContentHandler
detect stringAsXml elements and switch to a mode where it just consumes
all the SAX events, converts them to XML string, and buffers them as the
simple content. And when it gets the closing stringAsXml elements then
it creates one of the object for the SAXInfosetInputter with the XML as
the simple content.

A downside to this approach is the DaffodilUnparseContentHandler doesn't
have access to the runtimeProperties to know if an element should have
the special stringAsXml logic and know when to mode switch. However, I
guess it could just switch when it gets a SAX event with a startElement
with name "stringAsXml", but that would break for schema elements that
actually have that name but aren't one of these stringAsXml things and
is just coincidentally named. That doesn't feel very likely though, and
we could always make the name more unique if that was a concern. There
might be another approach, but that's he first thing that comes to my mind.

So all that to say, I don't think it's actually that crazy, and probably
wouldn't be a ton of work. Probably approximately the same as the
changes to the XMLTextInfosetInputter/Outputter, maybe a bit more
complex on unparse. And we can reuse some of that logic and experience
so we aren't starting from scratch.

> SAX to SAX implementation of stringAsXML feature to enable EXI to still work
> ----------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2752
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2752
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Performance, SAX
>    Affects Versions: 3.4.0
>            Reporter: Mike Beckerle
>            Priority: Major
>
> The stringAsXML feature is implemented in the XML Text Infoset Inputter/Outputter.
> This makes it impossible to use it along with EXI as the Infoset representation.
> If this was done as a SAX to SAX transformation instead then it would be compatible with EXI.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)