You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by Thomas Gentsch <tg...@e-tge.de> on 2013/02/19 16:32:16 UTC

SDO and XML escapes (Java and C++)

Hello all,

we are using both C++ and Java SDO in a project and discovered some
misbehavior in the C++ components with XML data converted from/to SDO if
the XML contains either escaped chars of CDATA. Java seems to do it
mostly right (see below)

When looking at the SDO (C++ M3) code and searching on the web (e.g.
[1]) it looks as if this topic seemed a bit, well, incomplete in the C++
world.

The problem (C++):
- loading an XML with CDATA inside works nicely, the CDATA remains 
  intact, therefore saving works nicely too. However, if I do a 
  DataObjectPtr->getCString(), I get the CDATA in the returned value - 
  means as a user I have to deal with that :-|
- loading an XML with escaped (e.g. &lt;) works too, libxml2 converts 
  these chars. getCString() returns the real text (e.g. "<"), but 
  saving does not re-insert the escaping - i.e. the resulting XML is 
  not usable anymore (TUSCANY-1553)
In Java this looks much better and quite as I'd expect it to:
- loading XML with either constructs works
- using getCString() just returns the real text with the escaped 
  sections converted
- saving works too, CDATA are lost but are rather converted back to 
  escaped XML - this is not the *original* XML anymore but at least it 
  is valid and logically it is the same as the input
- Example:
  Input XML:
     <tns1:name>&#252;&lt;&gt;bla blub <![CDATA[ <<>> ]]></tns1:name>
  getCString() in Java:
     "ü<>bla blub  <<>> "
  Saving this as XML:
     <tns1:name>ü&lt;>bla blub  &lt;&lt;>> </tns1:name>
  The only questionable thing is the saved "ü" ... to be 
  converted back to &uuml; or &#252; ?

Anyway, now the question: As it seems there were discussions going on
when SDO C++ has been implemented - has the approach above (as in Java)
ever been considered and, if so, why has it not been followed?
I believe that this would have been also much simpler than it is today:
- while parsing
  - the cdata handler function of the SAX2 handler just 
    appends the text returned by libxml2
  - escaped chars are converted by libxml2 anyway
- the property value now contains the real text
  (e.g. "ü<>bla blub  <<>> ") and returns it just as-is in getCString()
- setting that property also just sets the passed-in value
- saving the property just calls libxml2 xmlTextWriterWriteString() 
  which should escape the special chars

Another advantage is that users don't need to worry about (un)escaping
special chars or CDATA as today.

Any insight is very welcome!
Regards,
  tge


Re: SDO and XML escapes (Java and C++)

Posted by Thomas Gentsch <tg...@e-tge.de>.
I suppose, nobody really cares :-) but I entered a Jira anyway
(TUSCANY-4075)
I looked further into it for potential side effects and commented in
Jira - was not sure about sequences and lists but that should be safe
too (and at least not worse than today).

I tested it and did not find any problems so far.

Still, if there are comments, I'd be glad for opinions/...!!
Rgds,
  tge

On Tue, 2013-02-19 at 16:32 +0100, Thomas Gentsch wrote:
> Hello all,
> 
> we are using both C++ and Java SDO in a project and discovered some
> misbehavior in the C++ components with XML data converted from/to SDO if
> the XML contains either escaped chars of CDATA. Java seems to do it
> mostly right (see below)
> 
> When looking at the SDO (C++ M3) code and searching on the web (e.g.
> [1]) it looks as if this topic seemed a bit, well, incomplete in the C++
> world.
> 
> The problem (C++):
> - loading an XML with CDATA inside works nicely, the CDATA remains 
>   intact, therefore saving works nicely too. However, if I do a 
>   DataObjectPtr->getCString(), I get the CDATA in the returned value - 
>   means as a user I have to deal with that :-|
> - loading an XML with escaped (e.g. &lt;) works too, libxml2 converts 
>   these chars. getCString() returns the real text (e.g. "<"), but 
>   saving does not re-insert the escaping - i.e. the resulting XML is 
>   not usable anymore (TUSCANY-1553)
> In Java this looks much better and quite as I'd expect it to:
> - loading XML with either constructs works
> - using getCString() just returns the real text with the escaped 
>   sections converted
> - saving works too, CDATA are lost but are rather converted back to 
>   escaped XML - this is not the *original* XML anymore but at least it 
>   is valid and logically it is the same as the input
> - Example:
>   Input XML:
>      <tns1:name>&#252;&lt;&gt;bla blub <![CDATA[ <<>> ]]></tns1:name>
>   getCString() in Java:
>      "ü<>bla blub  <<>> "
>   Saving this as XML:
>      <tns1:name>ü&lt;>bla blub  &lt;&lt;>> </tns1:name>
>   The only questionable thing is the saved "ü" ... to be 
>   converted back to &uuml; or &#252; ?
> 
> Anyway, now the question: As it seems there were discussions going on
> when SDO C++ has been implemented - has the approach above (as in Java)
> ever been considered and, if so, why has it not been followed?
> I believe that this would have been also much simpler than it is today:
> - while parsing
>   - the cdata handler function of the SAX2 handler just 
>     appends the text returned by libxml2
>   - escaped chars are converted by libxml2 anyway
> - the property value now contains the real text
>   (e.g. "ü<>bla blub  <<>> ") and returns it just as-is in getCString()
> - setting that property also just sets the passed-in value
> - saving the property just calls libxml2 xmlTextWriterWriteString() 
>   which should escape the special chars
> 
> Another advantage is that users don't need to worry about (un)escaping
> special chars or CDATA as today.
> 
> Any insight is very welcome!
> Regards,
>   tge