You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Carsten Burghardt <ca...@cburghardt.com> on 2008/08/08 13:51:38 UTC

Encoding problem

Hi,

first of all I know that this is more a question for the user list but  
nobody could help me there - so apologies but I'll try as I don't know  
how to continue. I've a webservice (Axis 1.4) that connects to an  
Alfresco server and stores metadata from emails (like subject, sender,  
...). This works fine with ISO-* or UTF-8 encoded emails. But once I  
have an email with more "exotic" character sets like KOI8-R (russian)  
I get an error on the server side because of invalid characters (like  
0x1e). I know that no control characters are in the content so I  
watched the traffic with tcpmon and noticed that all characters were  
totally screwed up.
So I traced the Axis code and saw that the characters were encoded  
with &#<hex>; in the SoapBody. Afterwards the DOM tree is serialized  
in the DoAllSender class and then the characters are broken in the  
generated XML. When I switched the encoding of the Soap Message to  
KOI8-R instead of UTF-8 the characters showed up fine in the tcpmon  
but then the server reports an error about a different illegal  
character (0x1) which is probably because the message is converted to  
UTF-8 at a certain step.
So I guess my questions is: what is the proposed way to transmit those  
characters to a webservice (apart from Base64 encoding  them...)?

Many thanks

Carsten


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org


Re: Encoding problem

Posted by WJ Krpelan <kr...@yahoo.com>.
Hi,
as to how to do this I'd say take one of your breaking russian characters and look it up by using for example some windows-build-in character utility   i cant recollect its name right now
In principle I dont doubt this codepoints are correct.
There are even some axis-emails about a similar problem with chinese characters. you could try to google for it.
If you can verify that eclipse is able to display russian characters i would agree a bug is likely.

Proving it is another matter but you only have to show that any one single character is being translated wrongly giving the exact numeric value
Cheeers,
Wolfgang


--- On Mon, 8/11/08, Carsten Burghardt <ca...@cburghardt.com> wrote:

> From: Carsten Burghardt <ca...@cburghardt.com>
> Subject: Re: Encoding problem
> To: axis-dev@ws.apache.org
> Date: Monday, August 11, 2008, 4:40 PM
> Quoting "WJ Krpelan" <kr...@yahoo.com>:
> 
> > Hi,
> > hope I got this right.
> > The encoding with &#<hex>;  looks perfect to
> me.
> > You should check wether the actual hex-values
> correspond to the  
> > UNICODE-CODEPONTS of you Russian Characters.
> 
> Hmm, how do I do this?
> 
> > If this is the case, how did you verify the characters
> were broken  
> > inside the DOM-tree. Is your tool capable of showing
> Russiaan  
> > characters?
> 
> Yes, I debugged it with Eclipse therefore I could see that
> the  
> characters were not displayed correctly.
> 
> > Broken would mean that the numeric values in your
> UTF-8 XML do not  
> > correspond to the UTF-8-values of your Russian
> Characters, which are  
> > quite different from the UNICODE-Codepoints.
> >
> > HTH,
> > Wolfgang
> >
> >
> >
> >
> >
> > --- On Fri, 8/8/08, Carsten Burghardt
> <ca...@cburghardt.com> wrote:
> >
> >> From: Carsten Burghardt
> <ca...@cburghardt.com>
> >> Subject: Encoding problem
> >> To: axis-dev@ws.apache.org
> >> Date: Friday, August 8, 2008, 1:51 PM
> >> Hi,
> >>
> >> first of all I know that this is more a question
> for the
> >> user list but
> >> nobody could help me there - so apologies but
> I'll try
> >> as I don't know
> >> how to continue. I've a webservice (Axis 1.4)
> that
> >> connects to an
> >> Alfresco server and stores metadata from emails
> (like
> >> subject, sender,
> >> ...). This works fine with ISO-* or UTF-8 encoded
> emails.
> >> But once I
> >> have an email with more "exotic"
> character sets
> >> like KOI8-R (russian)
> >> I get an error on the server side because of
> invalid
> >> characters (like
> >> 0x1e). I know that no control characters are in
> the content
> >> so I
> >> watched the traffic with tcpmon and noticed that
> all
> >> characters were
> >> totally screwed up.
> >> So I traced the Axis code and saw that the
> characters were
> >> encoded
> >> with &#<hex>; in the SoapBody.
> Afterwards the DOM
> >> tree is serialized
> >> in the DoAllSender class and then the characters
> are broken
> >> in the
> >> generated XML. When I switched the encoding of the
> Soap
> >> Message to
> >> KOI8-R instead of UTF-8 the characters showed up
> fine in
> >> the tcpmon
> >> but then the server reports an error about a
> different
> >> illegal
> >> character (0x1) which is probably because the
> message is
> >> converted to
> >> UTF-8 at a certain step.
> >> So I guess my questions is: what is the proposed
> way to
> >> transmit those
> >> characters to a webservice (apart from Base64
> encoding
> >> them...)?
> >>
> >> Many thanks
> >>
> >> Carsten
> >>
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
> axis-dev-unsubscribe@ws.apache.org
> >> For additional commands, e-mail:
> >> axis-dev-help@ws.apache.org
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> axis-dev-unsubscribe@ws.apache.org
> > For additional commands, e-mail:
> axis-dev-help@ws.apache.org
> >
> >
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail:
> axis-dev-help@ws.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org


Re: Encoding problem

Posted by Carsten Burghardt <ca...@cburghardt.com>.
Quoting "WJ Krpelan" <kr...@yahoo.com>:

> Hi,
> hope I got this right.
> The encoding with &#<hex>;  looks perfect to me.
> You should check wether the actual hex-values correspond to the  
> UNICODE-CODEPONTS of you Russian Characters.

Hmm, how do I do this?

> If this is the case, how did you verify the characters were broken  
> inside the DOM-tree. Is your tool capable of showing Russiaan  
> characters?

Yes, I debugged it with Eclipse therefore I could see that the  
characters were not displayed correctly.

> Broken would mean that the numeric values in your UTF-8 XML do not  
> correspond to the UTF-8-values of your Russian Characters, which are  
> quite different from the UNICODE-Codepoints.
>
> HTH,
> Wolfgang
>
>
>
>
>
> --- On Fri, 8/8/08, Carsten Burghardt <ca...@cburghardt.com> wrote:
>
>> From: Carsten Burghardt <ca...@cburghardt.com>
>> Subject: Encoding problem
>> To: axis-dev@ws.apache.org
>> Date: Friday, August 8, 2008, 1:51 PM
>> Hi,
>>
>> first of all I know that this is more a question for the
>> user list but
>> nobody could help me there - so apologies but I'll try
>> as I don't know
>> how to continue. I've a webservice (Axis 1.4) that
>> connects to an
>> Alfresco server and stores metadata from emails (like
>> subject, sender,
>> ...). This works fine with ISO-* or UTF-8 encoded emails.
>> But once I
>> have an email with more "exotic" character sets
>> like KOI8-R (russian)
>> I get an error on the server side because of invalid
>> characters (like
>> 0x1e). I know that no control characters are in the content
>> so I
>> watched the traffic with tcpmon and noticed that all
>> characters were
>> totally screwed up.
>> So I traced the Axis code and saw that the characters were
>> encoded
>> with &#<hex>; in the SoapBody. Afterwards the DOM
>> tree is serialized
>> in the DoAllSender class and then the characters are broken
>> in the
>> generated XML. When I switched the encoding of the Soap
>> Message to
>> KOI8-R instead of UTF-8 the characters showed up fine in
>> the tcpmon
>> but then the server reports an error about a different
>> illegal
>> character (0x1) which is probably because the message is
>> converted to
>> UTF-8 at a certain step.
>> So I guess my questions is: what is the proposed way to
>> transmit those
>> characters to a webservice (apart from Base64 encoding
>> them...)?
>>
>> Many thanks
>>
>> Carsten
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail:
>> axis-dev-help@ws.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-dev-help@ws.apache.org
>
>





---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org


Re: Encoding problem

Posted by WJ Krpelan <kr...@yahoo.com>.
Hi,
hope I got this right. 
The encoding with &#<hex>;  looks perfect to me.
You should check wether the actual hex-values correspond to the UNICODE-CODEPONTS of you Russian Characters.
If this is the case, how did you verify the characters were broken inside the DOM-tree. Is your tool capable of showing Russiaan characters? 
Broken would mean that the numeric values in your UTF-8 XML do not correspond to the UTF-8-values of your Russian Characters, which are quite different from the UNICODE-Codepoints.

HTH,
Wolfgang





--- On Fri, 8/8/08, Carsten Burghardt <ca...@cburghardt.com> wrote:

> From: Carsten Burghardt <ca...@cburghardt.com>
> Subject: Encoding problem
> To: axis-dev@ws.apache.org
> Date: Friday, August 8, 2008, 1:51 PM
> Hi,
> 
> first of all I know that this is more a question for the
> user list but  
> nobody could help me there - so apologies but I'll try
> as I don't know  
> how to continue. I've a webservice (Axis 1.4) that
> connects to an  
> Alfresco server and stores metadata from emails (like
> subject, sender,  
> ...). This works fine with ISO-* or UTF-8 encoded emails.
> But once I  
> have an email with more "exotic" character sets
> like KOI8-R (russian)  
> I get an error on the server side because of invalid
> characters (like  
> 0x1e). I know that no control characters are in the content
> so I  
> watched the traffic with tcpmon and noticed that all
> characters were  
> totally screwed up.
> So I traced the Axis code and saw that the characters were
> encoded  
> with &#<hex>; in the SoapBody. Afterwards the DOM
> tree is serialized  
> in the DoAllSender class and then the characters are broken
> in the  
> generated XML. When I switched the encoding of the Soap
> Message to  
> KOI8-R instead of UTF-8 the characters showed up fine in
> the tcpmon  
> but then the server reports an error about a different
> illegal  
> character (0x1) which is probably because the message is
> converted to  
> UTF-8 at a certain step.
> So I guess my questions is: what is the proposed way to
> transmit those  
> characters to a webservice (apart from Base64 encoding 
> them...)?
> 
> Many thanks
> 
> Carsten
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail:
> axis-dev-help@ws.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org