You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by "Thomas Gentsch (JIRA)" <de...@tuscany.apache.org> on 2013/02/21 11:52:12 UTC

[jira] [Created] (TUSCANY-4075) SDO C++ handling of XML CDATA and escaped chars inconsistent

Thomas Gentsch created TUSCANY-4075:
---------------------------------------

             Summary: SDO C++ handling of XML CDATA and escaped chars inconsistent
                 Key: TUSCANY-4075
                 URL: https://issues.apache.org/jira/browse/TUSCANY-4075
             Project: Tuscany
          Issue Type: Bug
          Components: C++ SDO
    Affects Versions: Cpp-M3
            Reporter: Thomas Gentsch


we are using both C++ and Java SDO in a project and discovered some
misbehavior in the C++ components with XML data converted from/to SDO if
the XML contains either escaped chars of CDATA. Java seems to do it
mostly right (see below)

When looking at the SDO (C++ M3) code and searching on the web (e.g.
[1]) it looks as if this topic seemed a bit, well, incomplete in the C++
world.

The problem (C++):
- loading an XML with CDATA inside works nicely, the CDATA remains 
  intact, therefore saving works nicely too. However, if I do a 
  DataObjectPtr->getCString(), I get the CDATA in the returned value - 
  means as a user I have to deal with that :-|
- loading an XML with escaped (e.g. &lt;) works too, libxml2 converts 
  these chars. getCString() returns the real text (e.g. "<"), but 
  saving does not re-insert the escaping - i.e. the resulting XML is 
  not usable anymore (TUSCANY-1553)
In Java this looks much better and quite as I'd expect it to:
- loading XML with either constructs works
- using getCString() just returns the real text with the escaped 
  sections converted
- saving works too, CDATA are lost but are rather converted back to 
  escaped XML - this is not the *original* XML anymore but at least it 
  is valid and logically it is the same as the input
- Example:
  Input XML:
     <tns1:name>&#252;&lt;&gt;bla blub <![CDATA[ <<>> ]]></tns1:name>
  getCString() in Java:
     "ü<>bla blub  <<>> "
  Saving this as XML:
     <tns1:name>ü&lt;>bla blub  &lt;&lt;>> </tns1:name>
  The only questionable thing is the saved "ü" ... to be 
  converted back to &uuml; or &#252; ?

Anyway, now the question: As it seems there were discussions going on
when SDO C++ has been implemented - has the approach above (as in Java)
ever been considered and, if so, why has it not been followed?
I believe that this would have been also much simpler than it is today:
- while parsing
  - the cdata handler function of the SAX2 handler just 
    appends the text returned by libxml2
  - escaped chars are converted by libxml2 anyway
- the property value now contains the real text
  (e.g. "ü<>bla blub  <<>> ") and returns it just as-is in getCString()
- setting that property also just sets the passed-in value
- saving the property just calls libxml2 xmlTextWriterWriteString() 
  which should escape the special chars

Another advantage is that users don't need to worry about (un)escaping
special chars or CDATA as today. Disadvantage: API behavior changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira