You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-user@axis.apache.org by Graham Holden <Gr...@versionone.co.uk> on 2011/10/05 14:24:04 UTC

Problem extracting UTF-8 encoded data

Hello,

I've written a service in Axis2/C (version 1.6.0) that seems to work fine with normal ASCII data, but doesn't cope with UTF-8 encoded data.

A mailing-list post from 2009 (http://marc.info/?l=axis-user&m=126001317008901&w=2) says that this should work if the request conforms to SOAP specs -- I think mine does; the relevant TCPMON snippet is shown below.  The offending line, about half-way through is:

	<displayName>Archived File Name ���g</displayName>

The two "offending" characters are 0xC3 (195) and 0xB9 (185) which, I believe, are the UTF-8 encoding of U+00F9 (�� - lowercase 'u' with grave), and viewing the payload as an XML file in Internet Explorer seems to confirm this.

Without these two characters, everything works fine, but with them, the node is being returned (by "axiom_node_get_next_sibling()") with empty contents ("<displayName></displayName>").

Do I need to do anything to "turn on" utf-8 encoding?  Is this a problem with Axis?

Thanks in advance for any help people can give.

Graham Holden.

------ ------ ------ TCPMON Begin ------ ------ -----
SENDING DATA..
/* sending time = 9:57:14*/
/* message uuid = 92bf09dc-1f65-44d9-854c-ef341ba2d4b4*/
---------------------
POST /axis2/services/archive HTTP/1.1
SOAPAction: "http://localhost:9099/axis2/services/archive"
Content-Type: text/xml;
        charset=utf-8
Content-Length: 1456
User-Agent: httpPost
Host: localhost:9099
Cache-Control: no-cache

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema"
               xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<maintainTables>
<connection>
<server>127.0.0.1</server>
<port>31417</port>
<username>administrator</username>
<password>administrator</password>
<SID>
</SID>
</connection>
<action>check</action>
<layout>
<table>V1X3_V1XML</table>
<displayName>XML File Archive</displayName>
<fields>
<field>
<fieldName>ARCH_USER</fieldName>
<displayName>Deposited By</displayName>
<format>string</format>
<length>32</length>
</field>
<field>
<fieldName>ARCH_DATE</fieldName>
<displayName>Deposit Date</displayName>
<format>date</format>
<length>0</length>
</field>
<field>
<fieldName>BLOB</fieldName>
<displayName>BLOB ID</displayName>
<format>blob</format>
<length>0</length>
</field>
<field>
<fieldName>FILENAME</fieldName>
<displayName>Archived File Name ���g</displayName>
<format>string</format>
<length>255</length>
</field>
<field>
<fieldName>DOC_KEY</fieldName>
<displayName>Document Key</displayName>
<format>string</format>
<length>255</length>
</field>
<field>
<fieldName>DOC_ID</fieldName>
<displayName>Unique Reference</displayName>
<format>string</format>
<length>50</length>
</field>
<field>
<fieldName>REV_NUMBER</fieldName>
<displayName>Version</displayName>
<format>integer</format>
<length>0</length>
</field>
</fields>
</layout>
</maintainTables>
</soap:Body>
</soap:Envelope>
------ ------ ------ TCPMON Begin ------ ------ -----
----------------------------------------------------------------
Version One End-User Seminars

REGISTER NOW!
http://www.versionone.co.uk/seminars

----------------------------------------------------------------
Version One Ltd. is the author of intelligent electronic document
delivery and imaging software. This software enables the electronic
storage, retrieval, management, enhancement and delivery of business
documents such as invoices, purchase orders and statements. Version One's
'paperless office' technology is seamlessly integrated into all major 
finance and ERP systems. With a typical ROI of less than six months, 
Version One's solutions are enabling thousands of organisations to 
save dramatic amounts of time and money. 

Version One is a Member of BASDA (Business Application Software 
Developers Association) and ESA (European Software Association).

The opinions expressed within this email represent those of the 
individual and not necessarily those of Version One Limited. 
The contents of this email may be privileged and are confidential. 
It may not be disclosed to or used by anyone other than the addressee(s), 
nor copied in any way.

Version One Limited, Pentland House, Village Way, Wilmslow, Cheshire, SK9 2GH, UK.
Registered office: Munro House, Portsmouth Road, Cobham, Surrey, KT11 1TF, UK
Registered in England Number: 2443078   VAT Registration Number: 927 5479 83

Version One is an Advanced Computer Software PLC company.

-----------------------------------------------------------------

 

Think about the environment - Do you really need to print this email?

---------------------------------------------------------------------
To unsubscribe, e-mail: c-user-unsubscribe@axis.apache.org
For additional commands, e-mail: c-user-help@axis.apache.org


AW: Problem extracting UTF-8 encoded data

Posted by Stadelmann Josef <jo...@axa-winterthur.ch>.
Yes, the module encoding.c of libxml2 has conversion routines Isoalt1ToUtf8() and Utf8ToIsolat1()
Also essential that ENCODING is set properly in Axiom routines.
I really wonder how guthila can make it without.

Also it depends what encoding you platform has. What is Java's encoding using by default? 
(Given you talk to a Java server or from a Java client)
Also under strict XML terms and conditions and standards and vice versa,
there is no such thing as an 8bit ASCII. As soon as bit8 is set you have an escape technique
and then you need to know what it is. Is it UTF-8 or What? And if it is encoded as UTF-8
then you need to follow the escaping techniques used for the selected encoding. 

The routines above in libxml2 help you on that. We had to use it as well as our OpenVMS service joint with ORACLE 
and PASCAL code understands only ISOLATIN1 ISO-8859-1. But isolatin1 cannot go straight in any soap-xml stream. 
So you need to convert from isolatin1 to UTF-8 and back.

Josef



-----Urspr��ngliche Nachricht-----
Von: Rune Sindahl [mailto:rs@lpt.dk] 
Gesendet: Mittwoch, 5. Oktober 2011 14:47
An: 'Apache AXIS C User List'
Betreff: SV: Problem extracting UTF-8 encoded data

>From the top of my head I believe I had a similar issue. I located the
problem to be linked with the XML parser(guththila). I recompiled the server
with libxml xml parser instead.

Best regards,
Rune Sindahl

-----Oprindelig meddelelse-----
Fra: Graham Holden [mailto:Graham.Holden@versionone.co.uk] 
Sendt: 5. oktober 2011 14:24
Til: Apache AXIS C User List
Emne: Problem extracting UTF-8 encoded data

Hello,

I've written a service in Axis2/C (version 1.6.0) that seems to work fine
with normal ASCII data, but doesn't cope with UTF-8 encoded data.

A mailing-list post from 2009
(http://marc.info/?l=axis-user&m=126001317008901&w=2) says that this should
work if the request conforms to SOAP specs -- I think mine does; the
relevant TCPMON snippet is shown below.  The offending line, about half-way
through is:

	<displayName>Archived File Name ���g</displayName>

The two "offending" characters are 0xC3 (195) and 0xB9 (185) which, I
believe, are the UTF-8 encoding of U+00F9 (�� - lowercase 'u' with grave),
and viewing the payload as an XML file in Internet Explorer seems to confirm
this.

Without these two characters, everything works fine, but with them, the node
is being returned (by "axiom_node_get_next_sibling()") with empty contents
("<displayName></displayName>").

Do I need to do anything to "turn on" utf-8 encoding?  Is this a problem
with Axis?

Thanks in advance for any help people can give.

Graham Holden.

------ ------ ------ TCPMON Begin ------ ------ -----
SENDING DATA..
/* sending time = 9:57:14*/
/* message uuid = 92bf09dc-1f65-44d9-854c-ef341ba2d4b4*/
---------------------
POST /axis2/services/archive HTTP/1.1
SOAPAction: "http://localhost:9099/axis2/services/archive"
Content-Type: text/xml;
        charset=utf-8
Content-Length: 1456
User-Agent: httpPost
Host: localhost:9099
Cache-Control: no-cache

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema"
               xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<maintainTables>
<connection>
<server>127.0.0.1</server>
<port>31417</port>
<username>administrator</username>
<password>administrator</password>
<SID>
</SID>
</connection>
<action>check</action>
<layout>
<table>V1X3_V1XML</table>
<displayName>XML File Archive</displayName>
<fields>
<field>
<fieldName>ARCH_USER</fieldName>
<displayName>Deposited By</displayName>
<format>string</format>
<length>32</length>
</field>
<field>
<fieldName>ARCH_DATE</fieldName>
<displayName>Deposit Date</displayName>
<format>date</format>
<length>0</length>
</field>
<field>
<fieldName>BLOB</fieldName>
<displayName>BLOB ID</displayName>
<format>blob</format>
<length>0</length>
</field>
<field>
<fieldName>FILENAME</fieldName>
<displayName>Archived File Name ���g</displayName>
<format>string</format>
<length>255</length>
</field>
<field>
<fieldName>DOC_KEY</fieldName>
<displayName>Document Key</displayName>
<format>string</format>
<length>255</length>
</field>
<field>
<fieldName>DOC_ID</fieldName>
<displayName>Unique Reference</displayName>
<format>string</format>
<length>50</length>
</field>
<field>
<fieldName>REV_NUMBER</fieldName>
<displayName>Version</displayName>
<format>integer</format>
<length>0</length>
</field>
</fields>
</layout>
</maintainTables>
</soap:Body>
</soap:Envelope>
------ ------ ------ TCPMON Begin ------ ------ -----
----------------------------------------------------------------
Version One End-User Seminars

REGISTER NOW!
http://www.versionone.co.uk/seminars

----------------------------------------------------------------
Version One Ltd. is the author of intelligent electronic document
delivery and imaging software. This software enables the electronic
storage, retrieval, management, enhancement and delivery of business
documents such as invoices, purchase orders and statements. Version One's
'paperless office' technology is seamlessly integrated into all major 
finance and ERP systems. With a typical ROI of less than six months, 
Version One's solutions are enabling thousands of organisations to 
save dramatic amounts of time and money. 

Version One is a Member of BASDA (Business Application Software 
Developers Association) and ESA (European Software Association).

The opinions expressed within this email represent those of the 
individual and not necessarily those of Version One Limited. 
The contents of this email may be privileged and are confidential. 
It may not be disclosed to or used by anyone other than the addressee(s), 
nor copied in any way.

Version One Limited, Pentland House, Village Way, Wilmslow, Cheshire, SK9
2GH, UK.
Registered office: Munro House, Portsmouth Road, Cobham, Surrey, KT11 1TF,
UK
Registered in England Number: 2443078   VAT Registration Number: 927 5479 83

Version One is an Advanced Computer Software PLC company.

-----------------------------------------------------------------

 

Think about the environment - Do you really need to print this email?

---------------------------------------------------------------------
To unsubscribe, e-mail: c-user-unsubscribe@axis.apache.org
For additional commands, e-mail: c-user-help@axis.apache.org

Ingen virus fundet i denne indkommende meddelelse.
Kontrolleret af AVG - www.avg.com 
Version: 9.0.914 / Virusdatabase: 271.1.1/3933 - Udgivelsesdato: 10/04/11
20:34:00


---------------------------------------------------------------------
To unsubscribe, e-mail: c-user-unsubscribe@axis.apache.org
For additional commands, e-mail: c-user-help@axis.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-user-unsubscribe@axis.apache.org
For additional commands, e-mail: c-user-help@axis.apache.org


SV: Problem extracting UTF-8 encoded data

Posted by Rune Sindahl <rs...@lpt.dk>.
>From the top of my head I believe I had a similar issue. I located the
problem to be linked with the XML parser(guththila). I recompiled the server
with libxml xml parser instead.

Best regards,
Rune Sindahl

-----Oprindelig meddelelse-----
Fra: Graham Holden [mailto:Graham.Holden@versionone.co.uk] 
Sendt: 5. oktober 2011 14:24
Til: Apache AXIS C User List
Emne: Problem extracting UTF-8 encoded data

Hello,

I've written a service in Axis2/C (version 1.6.0) that seems to work fine
with normal ASCII data, but doesn't cope with UTF-8 encoded data.

A mailing-list post from 2009
(http://marc.info/?l=axis-user&m=126001317008901&w=2) says that this should
work if the request conforms to SOAP specs -- I think mine does; the
relevant TCPMON snippet is shown below.  The offending line, about half-way
through is:

	<displayName>Archived File Name ���g</displayName>

The two "offending" characters are 0xC3 (195) and 0xB9 (185) which, I
believe, are the UTF-8 encoding of U+00F9 (�� - lowercase 'u' with grave),
and viewing the payload as an XML file in Internet Explorer seems to confirm
this.

Without these two characters, everything works fine, but with them, the node
is being returned (by "axiom_node_get_next_sibling()") with empty contents
("<displayName></displayName>").

Do I need to do anything to "turn on" utf-8 encoding?  Is this a problem
with Axis?

Thanks in advance for any help people can give.

Graham Holden.

------ ------ ------ TCPMON Begin ------ ------ -----
SENDING DATA..
/* sending time = 9:57:14*/
/* message uuid = 92bf09dc-1f65-44d9-854c-ef341ba2d4b4*/
---------------------
POST /axis2/services/archive HTTP/1.1
SOAPAction: "http://localhost:9099/axis2/services/archive"
Content-Type: text/xml;
        charset=utf-8
Content-Length: 1456
User-Agent: httpPost
Host: localhost:9099
Cache-Control: no-cache

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema"
               xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<maintainTables>
<connection>
<server>127.0.0.1</server>
<port>31417</port>
<username>administrator</username>
<password>administrator</password>
<SID>
</SID>
</connection>
<action>check</action>
<layout>
<table>V1X3_V1XML</table>
<displayName>XML File Archive</displayName>
<fields>
<field>
<fieldName>ARCH_USER</fieldName>
<displayName>Deposited By</displayName>
<format>string</format>
<length>32</length>
</field>
<field>
<fieldName>ARCH_DATE</fieldName>
<displayName>Deposit Date</displayName>
<format>date</format>
<length>0</length>
</field>
<field>
<fieldName>BLOB</fieldName>
<displayName>BLOB ID</displayName>
<format>blob</format>
<length>0</length>
</field>
<field>
<fieldName>FILENAME</fieldName>
<displayName>Archived File Name ���g</displayName>
<format>string</format>
<length>255</length>
</field>
<field>
<fieldName>DOC_KEY</fieldName>
<displayName>Document Key</displayName>
<format>string</format>
<length>255</length>
</field>
<field>
<fieldName>DOC_ID</fieldName>
<displayName>Unique Reference</displayName>
<format>string</format>
<length>50</length>
</field>
<field>
<fieldName>REV_NUMBER</fieldName>
<displayName>Version</displayName>
<format>integer</format>
<length>0</length>
</field>
</fields>
</layout>
</maintainTables>
</soap:Body>
</soap:Envelope>
------ ------ ------ TCPMON Begin ------ ------ -----
----------------------------------------------------------------
Version One End-User Seminars

REGISTER NOW!
http://www.versionone.co.uk/seminars

----------------------------------------------------------------
Version One Ltd. is the author of intelligent electronic document
delivery and imaging software. This software enables the electronic
storage, retrieval, management, enhancement and delivery of business
documents such as invoices, purchase orders and statements. Version One's
'paperless office' technology is seamlessly integrated into all major 
finance and ERP systems. With a typical ROI of less than six months, 
Version One's solutions are enabling thousands of organisations to 
save dramatic amounts of time and money. 

Version One is a Member of BASDA (Business Application Software 
Developers Association) and ESA (European Software Association).

The opinions expressed within this email represent those of the 
individual and not necessarily those of Version One Limited. 
The contents of this email may be privileged and are confidential. 
It may not be disclosed to or used by anyone other than the addressee(s), 
nor copied in any way.

Version One Limited, Pentland House, Village Way, Wilmslow, Cheshire, SK9
2GH, UK.
Registered office: Munro House, Portsmouth Road, Cobham, Surrey, KT11 1TF,
UK
Registered in England Number: 2443078   VAT Registration Number: 927 5479 83

Version One is an Advanced Computer Software PLC company.

-----------------------------------------------------------------

 

Think about the environment - Do you really need to print this email?

---------------------------------------------------------------------
To unsubscribe, e-mail: c-user-unsubscribe@axis.apache.org
For additional commands, e-mail: c-user-help@axis.apache.org

Ingen virus fundet i denne indkommende meddelelse.
Kontrolleret af AVG - www.avg.com 
Version: 9.0.914 / Virusdatabase: 271.1.1/3933 - Udgivelsesdato: 10/04/11
20:34:00


---------------------------------------------------------------------
To unsubscribe, e-mail: c-user-unsubscribe@axis.apache.org
For additional commands, e-mail: c-user-help@axis.apache.org