You are viewing a plain text version of this content. The canonical link for it is here.
Posted to soap-user@ws.apache.org by "Pospisil, Pavel" <Po...@gedas.cz> on 2005/03/10 15:41:41 UTC

Problem with encoding String data

Hallo,
I am using SOAP 2.3.1 for our webservice client implementation
and now I am facing problem with sending two czech character "Ř" and "Á".
(all other czech characters are encoded and transmited without problems)

Our WS client is running with default encoding windows-1250, 
but communication need to be UTF-8.

public class AuthData 
{
  private String m_login;
  private String m_password;
  public AuthData(String login, String password)
  {
    m_login = login;
    m_password = password;
  }
}
.....
  public CiselnikPolozkaModel[] CiselnikModel(AuthData authData) throws
Exception
  {
    CiselnikPolozkaModel[] returnVal = null;

    URL endpointURL = new URL(endpoint);
    Call call = new Call();
    call.setSOAPTransport(m_httpConnection);
    call.setTargetObjectURI("urn:Autoplus");
    call.setMethodName("CiselnikModel");
    call.setEncodingStyleURI(Constants.NS_URI_SOAP_ENC);

    Vector params = new Vector();
    params.addElement(new Parameter("authData", AuthData.class, authData,
null));
    call.setParams(params);

    call.setSOAPMappingRegistry(m_smr);

    Response response = call.invoke(endpointURL,
"urn:Autoplus#ws_autoplus#CiselnikModel");

    if (!response.generatedFault())
    {
      Parameter result = response.getReturnValue();
      returnVal = (CiselnikPolozkaModel[])result.getValue();
    }
    else
    {
      Fault fault = response.getFault();
      throw new SOAPException(fault.getFaultCode(), fault.getFaultString());
    }

    return returnVal;
  }
.....

Because of communication is with UTF-8 encoding, i need to encode czech
characters into UTF-8 
before construct AuthData (or whatever other parameter with String in his
constructor)
otherwise i get something like <faultstring xsi:type="xsd:string">XML error
not well-formed (invalid token)</faultstring> from server.

I tried some primitive encoding for each String parameter like:

  AuthData a = new AuthData(toUTF8("ŽŠČ"), toUTF8("Řach"))

  private static String toUTF8(String input) 
          throws Exception
  {
    try
    {
      if (input != null)
      {
        return new String(input.getBytes("UTF8"));           
      }
      else
        return input;
    }
    catch(UnsupportedEncodingException e)
    {
  	  throw new Exception (e.getLocalizedMessage());
    }
  }

It is working good for all czech characters with exception "Ř" and "Á".
In these cases is conversion
Ř  -> Ĺ?,   Á -> Ă?
and server returns "not well-formed" exception to me.

Else I tried to encode string with
 private static String toUTF8(String input) 
          throws Exception
  {
     try
     {
       if (input != null)
       {        
    return input; 
         char[] ch = input.toCharArray();
         StringBuffer sb = new StringBuffer();
         for (int i = 0; i < ch.length; i++)
         {
           if ((int)ch[i] < 128)
             sb.append(ch[i]) ;   
           else 
           {
             sb.append("&#"+ (int)ch[i] +";");
           }
         }  
         return sb.toString();              
       }      
       else
       {
         return input;         
       }
     }
    catch(Exception e)
    {
  	  throw new Exception (e.getLocalizedMessage());
    }
  }

but in this case SOAP encode my &#344;&#193;  to &amp;#344;&amp;#193;
server return no error, but transmited data are &#344;&#193; insted of "Ř",
"Á" (of course -:))

Please, help me with some hint how safely encode and decode String data 
from default windows-1250 to UTF-8 and send to WS.

Sincerelly
Pavel