You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by David Melgar <dm...@us.ibm.com> on 2002/01/16 21:00:20 UTC

Double byte characters in SOAP msg question

This is sort of an XML/SOAP question, but folks here probably know the
answer...

If I want to send a SOAP message that contains a text node containing that
consists of double byte characters such as japanese characters.

Do I have to do anything special in Axis to create and send this message?
Do I have to do anything to make sure UTF-8 encoding is handled or does the
parser take care of it? It sounds like the default encoding in XML parsers
is UTF-16. Who does the conversion?

Has anyone tried this to make sure it works?

David Melgar
Web Services Toolkit Development
Emerging Technologies
dmelgar@us.ibm.com


[AXIS alpha3] Directory Hierarchy, e.g., lib.

Posted by Pae Choi <pa...@earthlink.net>.
I am just about to playing with AXIS alpha3 and I have found
a nonsense in the directory hierarchy and the same nonsense
in the document, "index.html" under the directory,
<AXIS_HOME>{'/' | '\'}docs, as well.

First, there are two "lib" directories under the directories as
follow:

  <AXIS_HOME>{'/' | '\'}
  <AXIS_HOME>{'/' | '\'}webapps{'/' | '\'}axis{'/' | '\'}WEB-INF

And both of them contain the same 4 JARs such as  "axis.jar",
"clutil.jar", "log4j-core.jar", and" wsdl4j.jar". The Servlet spec
v2.2 does not stopping you place those common packages
under the WEB-INF{'/' | '\'}lib though, it is a common practice
we do not place those under the same WEB-INF{'/' | '\'}lib
direcotry. If someone and/or orgs are doing that, they are not
thinking efficiently for sure.

Second, under the section, Step 2: installing the dependencies, it
says that

"In the WEB-INF directory, you'll find a "lib" directory.  

 In this directory, copy the jars associated with the JAXP 1.1
 XML compliant parser of your choice.  This generally means
 either the xerces.jar from the xml-xerces distribution, or the
 crimson.jar and jaxp.jar from the JAXP 1.1 reference implementation."
 
Here, I am going to give you a real-life scenario. Say we have
to create more than one AXIS context, e.g., AXIS1 and AXIS2,
and managed by the Servelt container. We have to place the
JAXP v1.1 compliant parser under the "lib" in each context, e.g.,

  AXIS1{'/' | '\'}\WEB-INF\lib
  AXIS2{'/' | '\'}\WEB-INF\lib

I know that we can make a symbolic link to a common place in
UNIX, but the same is not true in WIN32. Wouldn't it make sense
if we place all those common package JARs under the "lib" in the
<AXIS_HOME> so making them available to all AXIS applications
in different AXIS contexts, e.g., TC pattern for this situation.
And keeping the application specific ones in the either classes
or lib directory under the WEB-INF. Nothing stopping if someone
prefer to place them in the both directories though.

Please clarfy whether the documentation is mis-guiding that results
by creating the duplications in each context or this is a way it is
and will be as is now.


Pae




Re: Double byte characters in SOAP msg question

Posted by Ryo Neyama <ne...@trl.ibm.co.jp>.
David,

> If I want to send a SOAP message that contains a text node containing that
> consists of double byte characters such as japanese characters.
>
> Do I have to do anything special in Axis to create and send this message?
> Do I have to do anything to make sure UTF-8 encoding is handled or does
the
> parser take care of it? It sounds like the default encoding in XML parsers
> is UTF-16. Who does the conversion?
>
> Has anyone tried this to make sure it works?

It depends on how Axis calls an XML parser.

If an input XML document is provided as an InputStream, the XML parser
decides the encoding according to the "encoding" declaration in xml
declaration. For example, in case of <?xml version="1.0"
encoding="Shift_JIS">, the XML parser handles the input stream as Shift_JIS
encoding and Axis can handle the Shift_JIS characters within the XML
document as a Java string, i.e. UTF-16 string. If the encoding is UTF-16 and
there is a byte order mark, which indicates whether the input stream is big
endian or little endian, at the beginning of the input stream, the parser
will report a parsing error.  This is because the byte order mark is not
allowed as the first character in an XML document.

If an input XML document is provided as a InputStreamReader with an
appropriate encoding, or some wrapper Reader of such InputStreamReader, the
XML parser treats the input stream as UTF-8, and therefore it ignores any
"encoding" declaration.  In this case, Axis can handle the strings in the
XML document correctly.

Typically, the encoding is specified by Content-Type header in SOAP-HTTP.
How to resolve encoding when the encoding is specified both or one of the
Content-Type header and the "encoding" declaration is prescribed by RFC
3023.  Although Axis also should follow the RFC, I haven't checked it.

Best regards,
    Ryo Neyama @ IBM Research, Tokyo Research Laboratory
    Internet Technology
    neyama@trl.ibm.co.jp