You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Joshua Levy <le...@csl.sri.com> on 2006/12/12 21:03:13 UTC

Invalid XML characters in export

Recently I happened to create a String property containing chars
below #x20.   The System View XML export then attempts to escape
such values (e.g. as &#0; etc.).  However, most chars in this
range are in fact not valid XML characters at all, so import fails
  javax.jcr.InvalidSerializedDataException: failed to parse XML stream:
Character reference "&#0" is an invalid XML character.

Of course, binary data should be in a binary property, but
in the event some binary does somehow get into a String property,
it means the XML export appears to work, but is actually not usable.

Is there a way to deal with this issue? I wasn't able to find
much clarification on what the correct behavior should
be from the spec (String properties are supposed to
be like java.lang.Strings, but Sec 6.4.4 doesn't mention
ways to escape non-XML characters).

Regards,
Joshua
-- 
View this message in context: http://www.nabble.com/Invalid-XML-characters-in-export-tf2809830.html#a7840519
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Invalid XML characters in export

Posted by spamsucks <sp...@rhoderunner.com>.
This is crazy, but I have been working on this issue all day (on nothing 
related to jackrabbit).

I had an error in which I am trying to transfer a String with a 0x19 
character inside it using soap (xfire, axis).  The character causes the xml 
serialization/deserialization to bomb.

I posted my problem on the xfire list:  I got this response.  While it does 
not solve your problem, I think it is related.  I am trying to "clean" my 
string data so that this does not occur.


<<
The w3 consortium's XML spec doesn't allow it, please take a look at the XML 
specification:

http://www.w3.org/TR/REC-xml/#charsets

As you can see 0x19 is not included in the 'Char' list and a XML parser 
should not accept nor generate it.

A XML conformance test suite explicitly ensures that the low (i.e. <0x20) 
characters are rejected by the parser under test.
e.g. for Xerces see: 
http://xmlconf.sourceforge.net/xml/reports/report-xerces-cnv.html
>>>



----- Original Message ----- 
From: "Joshua Levy" <le...@csl.sri.com>
To: <us...@jackrabbit.apache.org>
Sent: Tuesday, December 12, 2006 3:03 PM
Subject: Invalid XML characters in export


>
> Recently I happened to create a String property containing chars
> below #x20.   The System View XML export then attempts to escape
> such values (e.g. as &#0; etc.).  However, most chars in this
> range are in fact not valid XML characters at all, so import fails
>  javax.jcr.InvalidSerializedDataException: failed to parse XML stream:
> Character reference "&#0" is an invalid XML character.
>
> Of course, binary data should be in a binary property, but
> in the event some binary does somehow get into a String property,
> it means the XML export appears to work, but is actually not usable.
>
> Is there a way to deal with this issue? I wasn't able to find
> much clarification on what the correct behavior should
> be from the spec (String properties are supposed to
> be like java.lang.Strings, but Sec 6.4.4 doesn't mention
> ways to escape non-XML characters).
>
> Regards,
> Joshua
> -- 
> View this message in context: 
> http://www.nabble.com/Invalid-XML-characters-in-export-tf2809830.html#a7840519
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
> 



Re: Invalid XML characters in export

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 12/13/06, Joshua Levy <le...@csl.sri.com> wrote:
> Jukka Zitting wrote:
> > Note that Jackrabbit does not yet implement this solution, so for now
> > the only workaround is to either avoid such characters in string
> > properties or to use custom import/export mechanisms.
>
> Aha, I see.  Thanks, Jukka.
>
> Roughly, is there an expected time frame for such
> maintenance draft features to get finalized and implemented?

I would expect Jackrabbit 1.3 to contain this and other JCR 1.0.1
fixes. An early time estimate for a 1.3 release would be sometime by
the end of Q1 next year.

Please file a Jackrabbit bug report about this to better track the
progress on the issue.

BR,

Jukka Zitting

Re: Invalid XML characters in export

Posted by Joshua Levy <le...@csl.sri.com>.

Jukka Zitting wrote:
> 
>> Is there a way to deal with this issue? I wasn't able to find
>> much clarification on what the correct behavior should
>> be from the spec (String properties are supposed to
>> be like java.lang.Strings, but Sec 6.4.4 doesn't mention
>> ways to escape non-XML characters).
> 
> It's a known issue with JSR-170, invalid XML characters in string
> properties break the XML imports. See the JSR-170 maintenance draft
> for the proposed solution (use Base64 encoding for such string
> properties) to this issue.
> 
> Note that Jackrabbit does not yet implement this solution, so for now
> the only workaround is to either avoid such characters in string
> properties or to use custom import/export mechanisms.
> 

Aha, I see.  Thanks, Jukka.

Roughly, is there an expected time frame for such
maintenance draft features to get finalized and implemented?

Regards,
Joshua
-- 
View this message in context: http://www.nabble.com/Invalid-XML-characters-in-export-tf2809830.html#a7842936
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Invalid XML characters in export

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 12/12/06, Joshua Levy <le...@csl.sri.com> wrote:
> Recently I happened to create a String property containing chars
> below #x20.   The System View XML export then attempts to escape
> such values (e.g. as &#0; etc.).  However, most chars in this
> range are in fact not valid XML characters at all, so import fails
> javax.jcr.InvalidSerializedDataException: failed to parse XML stream:
> Character reference "&#0" is an invalid XML character.
>
> Of course, binary data should be in a binary property, but
> in the event some binary does somehow get into a String property,
> it means the XML export appears to work, but is actually not usable.
>
> Is there a way to deal with this issue? I wasn't able to find
> much clarification on what the correct behavior should
> be from the spec (String properties are supposed to
> be like java.lang.Strings, but Sec 6.4.4 doesn't mention
> ways to escape non-XML characters).

It's a known issue with JSR-170, invalid XML characters in string
properties break the XML imports. See the JSR-170 maintenance draft
for the proposed solution (use Base64 encoding for such string
properties) to this issue.

Note that Jackrabbit does not yet implement this solution, so for now
the only workaround is to either avoid such characters in string
properties or to use custom import/export mechanisms.

BR,

Jukka Zitting