You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Rapheal Kaplan <ra...@mimir.net> on 2002/03/20 19:46:19 UTC

[HttpClient]Encoding

  Was working with a friend trying to determine the best way to read the 
contents of an HTTP response in to a string.  Since he's working within the 
Jakarta framework, including the HttpClient, we decided to use that API.  The 
simplest way seems to be:

  HttpClient hc = new HttpClient()
  UrlGetMethod gm = new UrlGetMethod(query);
  hc.startSession(url,80);
  hc.executeMethod(gm);

  String htmlText = gm.getResponseBodyAsString();

  I thought that seemed like a good idea, and wanted to check to make sure 
that the encoding was working correctly in getResponseBodyAsString.  I 
noticed there is also "byte[] getResponseBody" and getResponseBodyAsStream.  
It doesn't seem like the getResponseBodyAsString would encode the byte array 
properly.  Here is how it is written in 
org.apache.commons.httpclient.methods.GetMethod.java:

   /**
    * Return my response body, if any,
    * as a {@link String}.
    * Otherwise return <tt>null</tt>.
    */
   public String getResponseBodyAsString() {
       byte[] data = getResponseBody();
       if(null == data) {
           return null;
       } else {
           return new String(data);
       }
   }

  The problem is that the string is constructed using the default encoding of 
the VM, but not the encoding that the server might be sending the data in.  
For example, if the client is requesting a document written in Chinese, it 
could well use an entirely different encoding.

  Of course I am not worried about the getResponseBody and 
getResponseBodyAsStream methods.  Those should expose binary data.  However, 
the get...AsString should do something like:

   /**
    * Return my response body, if any,
    * as a {@link String}.
    * Otherwise return <tt>null</tt>.
    */
   public String getResponseBodyAsString() {
       byte[] data = getResponseBody();
       if(null == data) {
           return null;
       } else {
           return new String(data, getResponseEncoding());
       }
   }

  Of course I am making up the method getResponseEncoding as an example.

  Likewise, I would recommend a getResponseAsReader method that would return 
an InputStreamReader set to the proper encoding.

  Has anyone giving this problem any thought?  Or, is this design intentional 
and encoding is handled somewhere else?  Are there other issues?

  If there is a desire to solve the encoding problem (assuming I am correct 
in thinking it is missing), I am quite willing to participate in the design 
and encoding.

  Thank you.

  - Rapheal Kaplan




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>