You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by "Davanum Srinivas (JIRA)" <ji...@apache.org> on 2005/07/19 13:21:47 UTC

[jira] Created: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Support for UTF-16 (and ability to add others later)
----------------------------------------------------

         Key: AXIS2-80
         URL: http://issues.apache.org/jira/browse/AXIS2-80
     Project: Apache Axis 2.0 (Axis2)
        Type: Bug
    Reporter: Davanum Srinivas


Folks,

Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.

thanks,
dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Eran Chinthaka (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/AXIS2-80?page=all ]

Eran Chinthaka updated AXIS2-80:
--------------------------------

    Fix Version: 0.91
                     (was: 0.9)

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>   Components: transports
>     Reporter: Davanum Srinivas
>     Assignee: Ruchith Udayanga Fernando
>      Fix For: 0.91

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Davanum Srinivas (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12316079 ] 

Davanum Srinivas commented on AXIS2-80:
---------------------------------------

we definitely need to port test\encoding\TestString2.java to ensure foreign language support is working.

-- dims

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Davanum Srinivas (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12316745 ] 

Davanum Srinivas commented on AXIS2-80:
---------------------------------------

Was reviewing Axis 1.X code for Kirill...we take into account of in the incoming request's encoding by peeking at the charset in the Content-Type mime header. We probably should peek into the <?xml PI as well. We will need to store this information and then later use it when creating the response message.

-- dims

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Simon Fell (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12316780 ] 

Simon Fell commented on AXIS2-80:
---------------------------------

For SOAP 1.1 at least, the specs are clear that the charset indicated in the content-type header overrides any charset you might sniff from the XML (i.e. you shouldn't be looking to see what charset the xml claims it is).

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Ruchith Fernando (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12317016 ] 

Ruchith Fernando commented on AXIS2-80:
---------------------------------------

Hi all,

I used Axis test case test\encoding\TestString2.java and created one to test the  to test the handling of UTF-16 encoding by the StAX parser and serializer. 

Tests:
11 Tests from the above Axis test case were used:
 - testSimpleString
 - testStringWithApostrophes
 - testStringWithEntities
 - testStringWithRawEntities
 - testStringWithLeadingAndTrailingSpaces
 - testWhitespace
 - testFrenchAccents
 - testGermanUmlauts
 - testWelcomeUnicode
 - testWelcomeUnicode2
 - testWelcomeUnicode3

Out of the above 4 tests failed:
 - testStringWithLeadingAndTrailingSpaces
 - testWhitespace
   This is because the serialization removes the whitespaces

 - testGermanUmlauts
 - testWelcomeUnicode
   Still not sure of the reason for this faliure


I was able to remove the hard coding of UTF-8 in the Axis2 transports and introduced 
CHARACTER_SET_ENCODING option to the MessageContext. Now the transports will check for this property in the message context when creating the readers and writers and if not found UTF-8 will be used as the default.

Right now I'm working on ensuaring the response message is of the same char-encoding as the request message. (By setting the CHARACTER_SET_ENCODING property in the operation context of the relevant message context)

I'll send in the patches and test cases at the end of day :-)

-Ruchith

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Ruchith Udayanga Fernando (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/AXIS2-80?page=all ]
     
Ruchith Udayanga Fernando resolved AXIS2-80:
--------------------------------------------

    Fix Version: 0.9
     Resolution: Fixed

Axis2 now suports any character encoding scheme that is supported by the Java Impl

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>   Components: transports
>     Reporter: Davanum Srinivas
>     Assignee: Ruchith Udayanga Fernando
>      Fix For: 0.9

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Sanjiva Weerawarana (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12316939 ] 

Sanjiva Weerawarana commented on AXIS2-80:
------------------------------------------

Dims, the answer to both your latest question is that its stuff the StAX parser/serializer has to do. There's nothing that can be done at the Axis2 level unless I'm missing something.

Simon, what you're pointing out is an HTTP rule IIRC- so the same applies for Axis2 for SOAP/HTTP too. So we need to look for the charset parameter of the content type and if its there pass that down to the parser so it overrides the charset indicated in the XML declaration, if any.

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Davanum Srinivas (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12316773 ] 

Davanum Srinivas commented on AXIS2-80:
---------------------------------------

More feedback/questions from Kirill:

- XML1.0 requires to put BOM for entities encoded with utf-16, do you
think you could file a bug to make Axis emit it (if it doesn't already)?

- Does Axis requires xml declaration always for utf-16 or is it optional?
  <?xml version="1.0" encoding="utf-16"?>

-- dims

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Davanum Srinivas (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/AXIS2-80?page=comments#action_12316946 ] 

Davanum Srinivas commented on AXIS2-80:
---------------------------------------

Sanjiva,

- Axis's response should be in the same encoding as the request was.
- Let's at least confirm current behavior on whether the StaX serializer puts out BOM or not.
- There are several locations in the code where utf-8 is hardcoded. We need to fix them properly.
- We need a flag to control whether the response should include the xml declaration or not.
- We need additional tests for serialization and de-serialization for foreign characters both with UTF-8 and UTF-16.

thanks,
dims


> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (AXIS2-80) Support for UTF-16 (and ability to add others later)

Posted by "Srinath Perera (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/AXIS2-80?page=all ]

Srinath Perera reassigned AXIS2-80:
-----------------------------------

    Assign To: Ruchith Udayanga Fernando

> Support for UTF-16 (and ability to add others later)
> ----------------------------------------------------
>
>          Key: AXIS2-80
>          URL: http://issues.apache.org/jira/browse/AXIS2-80
>      Project: Apache Axis 2.0 (Axis2)
>         Type: Bug
>     Reporter: Davanum Srinivas
>     Assignee: Ruchith Udayanga Fernando

>
> Folks,
> Do we support UTF-16? Can we please add this to the list for 1.0. We need to structure things such that we can add others later if necessary. Please review the UTF-8/UTF-16 tests in axis 1.x (test/encoding) for inspiration.
> thanks,
> dims

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira