You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cxf.apache.org by qvall <qv...@o2.pl> on 2007/12/30 00:29:33 UTC

Illegal characters in xml

Hi, 
Is there any way to translate illegal characters that are in the xml
message? 
I'm receiving 
javax.xml.ws.soap.SOAPFaultException: Unmarshalling Error: 
Illegal character ((CTRL-CHAR, code 7))
and I'm wondering why text nodes are not encompassed with CDATA section? 
Do you know any simple workaround without doing it manually?

And the last simple question. Is there any way to make LoggingInInterceptor 
print formatted output? I mean with proper identation, 

patrick
-- 
View this message in context: http://www.nabble.com/Illegal-characters-in-xml-tp14542696p14542696.html
Sent from the cxf-user mailing list archive at Nabble.com.


Re: Illegal characters in xml

Posted by Benson Margulies <bi...@gmail.com>.
Once upon a time, in a galaxy far away, the W3C defined XML 1.0. If you
read the spec for XML 1.0, you will find, perhaps to your astonishment,
that some Unicode characters were banished, altogether, from XML. Not
relegated to entities, not prohibited in tag names. Forbidden.
Altogether. I am not aware of the logic of this decision, except that I
have a vague sense that the message was that control characters are
obsolete.

Some XML processors ignore this aspect of the spec. Some don't. By
default, CXF ends up using one that is picky. It has a lot of other
attractive characteristics ...

Meanwhile, the whole collection of JAX specs, not to mention all sorts
of other specifications, were written on the apparent assumption that a
Java String (or any other vector of Unicode characters) could be
parcel-posted across the known universe in an xsd:string. Sadly, it is
not so. It would have been good if these specs had offered some scheme
for packaging up these control characters, but they did not.

I'm not enough of a JAXB expert to tell you if there's a snail(@) that
will instruct JAXB to send some singular String property as a
base64-encoded item. 

I will tell you that sending multiple MBs of data in the inline XML is
probably not a best practice, and attachments were invented as a better
solution. Attachments aren't XML if you don't make them XML, and thus
you don't hit this situation.

In an Aegis mapping xml file, you should be able to ask for a property
to be forced to the xsd:base64 type. 





Re: Illegal characters in xml

Posted by qvall <qv...@o2.pl>.


>On Sun, 2007-12-30 at 07:25 -0800, qvall wrote:
>> Thanks for clarification. How can i make cxf use base64 encoding to
>> confictual strings then? Any method I should particularly read on? Or
>> maybe
>> i have to encode it manually?


>You have to force the use of a base64 data type. We don't have an
>automatic scheme for this that I know of. I'd do it by declaring byte[]
>instead of String. Are you using Aegis or JAXB?

Currently I'm using JAXB but I'm considering switching to Aegis  because
of better inheritance support (I just need to solve another problem 
with that first). I guess then this can't be done transparently without
resorting to data model modyfication?

>If the data is larger than small, I think I should be pushing you to
>consider MTOM or some other attachment scheme.

The content that is having this illegal characters is extracted
from pdf and doc documents that can be up to 5 MB. I guess 
I will have to read on that then.
    In the meantime I've added encryption hoping this would solve
my problem, but unfortunately it didn't. 

BTW I'm wondering about any particular reason why this characters
are forbidden.

Thanks,
Patrick




-- 
View this message in context: http://www.nabble.com/Illegal-characters-in-xml-tp14542696p14570959.html
Sent from the cxf-user mailing list archive at Nabble.com.


Re: Illegal characters in xml

Posted by Benson Margulies <bi...@gmail.com>.


On Sun, 2007-12-30 at 07:25 -0800, qvall wrote:
> Thanks for clarification. How can i make cxf use base64 encoding to
> confictual strings then? Any method I should particularly read on? Or maybe
> i have to encode it manually?


You have to force the use of a base64 data type. We don't have an
automatic scheme for this that I know of. I'd do it by declaring byte[]
instead of String. Are you using Aegis or JAXB?

If the data is larger than small, I think I should be pushing you to
consider MTOM or some other attachment scheme.

> 
> 
> 
> Benson Margulies-4 wrote:
> > 
> > CDATA doesn't help illegal characters. They can't be in XML, at all. Not
> > in
> > CDATA, not in &#. You need use attachments or base64 if you need to send
> > them around.
> > 
> > On Dec 29, 2007 6:29 PM, qvall <qv...@o2.pl> wrote:
> > 
> >>
> >> Hi,
> >> Is there any way to translate illegal characters that are in the xml
> >> message?
> >> I'm receiving
> >> javax.xml.ws.soap.SOAPFaultException: Unmarshalling Error:
> >> Illegal character ((CTRL-CHAR, code 7))
> >> and I'm wondering why text nodes are not encompassed with CDATA section?
> >> Do you know any simple workaround without doing it manually?
> >>
> >> And the last simple question. Is there any way to make
> >> LoggingInInterceptor
> >> print formatted output? I mean with proper identation,
> >>
> >> patrick
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Illegal-characters-in-xml-tp14542696p14542696.html
> >> Sent from the cxf-user mailing list archive at Nabble.com.
> >>
> >>
> > 
> > 
> 


Re: Illegal characters in xml

Posted by qvall <qv...@o2.pl>.
Thanks for clarification. How can i make cxf use base64 encoding to
confictual strings then? Any method I should particularly read on? Or maybe
i have to encode it manually?



Benson Margulies-4 wrote:
> 
> CDATA doesn't help illegal characters. They can't be in XML, at all. Not
> in
> CDATA, not in &#. You need use attachments or base64 if you need to send
> them around.
> 
> On Dec 29, 2007 6:29 PM, qvall <qv...@o2.pl> wrote:
> 
>>
>> Hi,
>> Is there any way to translate illegal characters that are in the xml
>> message?
>> I'm receiving
>> javax.xml.ws.soap.SOAPFaultException: Unmarshalling Error:
>> Illegal character ((CTRL-CHAR, code 7))
>> and I'm wondering why text nodes are not encompassed with CDATA section?
>> Do you know any simple workaround without doing it manually?
>>
>> And the last simple question. Is there any way to make
>> LoggingInInterceptor
>> print formatted output? I mean with proper identation,
>>
>> patrick
>> --
>> View this message in context:
>> http://www.nabble.com/Illegal-characters-in-xml-tp14542696p14542696.html
>> Sent from the cxf-user mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Illegal-characters-in-xml-tp14542696p14548076.html
Sent from the cxf-user mailing list archive at Nabble.com.


Re: Illegal characters in xml

Posted by Benson Margulies <bi...@gmail.com>.
CDATA doesn't help illegal characters. They can't be in XML, at all. Not in
CDATA, not in &#. You need use attachments or base64 if you need to send
them around.

On Dec 29, 2007 6:29 PM, qvall <qv...@o2.pl> wrote:

>
> Hi,
> Is there any way to translate illegal characters that are in the xml
> message?
> I'm receiving
> javax.xml.ws.soap.SOAPFaultException: Unmarshalling Error:
> Illegal character ((CTRL-CHAR, code 7))
> and I'm wondering why text nodes are not encompassed with CDATA section?
> Do you know any simple workaround without doing it manually?
>
> And the last simple question. Is there any way to make
> LoggingInInterceptor
> print formatted output? I mean with proper identation,
>
> patrick
> --
> View this message in context:
> http://www.nabble.com/Illegal-characters-in-xml-tp14542696p14542696.html
> Sent from the cxf-user mailing list archive at Nabble.com.
>
>

Re: Illegal characters in xml

Posted by James Mao <ja...@iona.com>.
Probably, Mostly you need to filter all the illegal chars from your 
contents before feed them to the xml parser
You need to filter all of them, not just 0x07

James

> Hi, 
> Is there any way to translate illegal characters that are in the xml
> message? 
> I'm receiving 
> javax.xml.ws.soap.SOAPFaultException: Unmarshalling Error: 
> Illegal character ((CTRL-CHAR, code 7))
> and I'm wondering why text nodes are not encompassed with CDATA section? 
> Do you know any simple workaround without doing it manually?
>
> And the last simple question. Is there any way to make LoggingInInterceptor 
> print formatted output? I mean with proper identation, 
>
> patrick
>