You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Rapheal Kaplan <ra...@mimir.net> on 2002/03/20 19:46:19 UTC
[HttpClient]Encoding
Was working with a friend trying to determine the best way to read the
contents of an HTTP response in to a string. Since he's working within the
Jakarta framework, including the HttpClient, we decided to use that API. The
simplest way seems to be:
HttpClient hc = new HttpClient()
UrlGetMethod gm = new UrlGetMethod(query);
hc.startSession(url,80);
hc.executeMethod(gm);
String htmlText = gm.getResponseBodyAsString();
I thought that seemed like a good idea, and wanted to check to make sure
that the encoding was working correctly in getResponseBodyAsString. I
noticed there is also "byte[] getResponseBody" and getResponseBodyAsStream.
It doesn't seem like the getResponseBodyAsString would encode the byte array
properly. Here is how it is written in
org.apache.commons.httpclient.methods.GetMethod.java:
/**
* Return my response body, if any,
* as a {@link String}.
* Otherwise return <tt>null</tt>.
*/
public String getResponseBodyAsString() {
byte[] data = getResponseBody();
if(null == data) {
return null;
} else {
return new String(data);
}
}
The problem is that the string is constructed using the default encoding of
the VM, but not the encoding that the server might be sending the data in.
For example, if the client is requesting a document written in Chinese, it
could well use an entirely different encoding.
Of course I am not worried about the getResponseBody and
getResponseBodyAsStream methods. Those should expose binary data. However,
the get...AsString should do something like:
/**
* Return my response body, if any,
* as a {@link String}.
* Otherwise return <tt>null</tt>.
*/
public String getResponseBodyAsString() {
byte[] data = getResponseBody();
if(null == data) {
return null;
} else {
return new String(data, getResponseEncoding());
}
}
Of course I am making up the method getResponseEncoding as an example.
Likewise, I would recommend a getResponseAsReader method that would return
an InputStreamReader set to the proper encoding.
Has anyone giving this problem any thought? Or, is this design intentional
and encoding is handled somewhere else? Are there other issues?
If there is a desire to solve the encoding problem (assuming I am correct
in thinking it is missing), I am quite willing to participate in the design
and encoding.
Thank you.
- Rapheal Kaplan
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>