You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Franck MARTIN <fr...@gmail.com> on 2006/10/21 23:40:42 UTC
Can't parse http response from http://babelfish.altavista.com
Hi all,
I am trying to use http://babelfish.altavista.com online traduction tool to
dynamically translate from a language to another.
But when i try to parse babelfish response to my request for a translation
to russian or greek, i get little squares instead of russian or greek
characters. So i tried to use the getResponseBody() on the post method to
get an array of bytes so i could convert it using UTF-8 or ISO-8859-1 or
UTF-16. No matter what character encoding i use i get those annoying little
squares. Here's is my code. Maybe you can figure out what's wrong :
HttpClient client = new HttpClient();
HttpClientParams params = client.getParams();
params.setParameter("http.useragent","Mozilla/5.0 (Windows; U; Windows
NT 5.1; fr; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4");
PostMethod post = new PostMethod("http://babelfish.altavista.com/tr");
NameValuePair[] id = {new NameValuePair("doit", "done"), new
NameValuePair("intl", "1"), new NameValuePair("tt", "urltext"), new
NameValuePair("trtext", "translate this!"), new NameValuePair("lp",
"en_ru"), new NameValuePair("btnTrTxt", "Traduction")};
post.setRequestBody(id);
post.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new
DefaultHttpMethodRetryHandler(5, false));
try {
status = client.executeMethod(post);
if (status != HttpStatus.SC_OK) {
message = "Altavista access restricted : status = " + status;
//log.warn(message);
}
else {
String encoding = post.getResponseCharSet();
String altavistaResponse /*= new String(post.getResponseBody(),
"ISO-8859-1")*/;
//altavistaResponse = new String(post.getResponseBody(),
"ISO-8859-7");
altavistaResponse = new String(post.getResponseBody(), "UTF-8");
//altavistaResponse = new String(post.getResponseBody(), "UTF-16");
//altavistaResponse = post.getResponseBodyAsString();
String translation = parseAltavistaResponse(altavistaResponse);
}
}
catch (HttpException httpE) {
message = "http error: " + httpE.getMessage();
log.warn(message);
}
catch (IOException ioE) {
message = "io error: " + ioE.getMessage();
log.warn(message);
}
catch (FwkHttpParseException pe) {
message = "FwkHttpParseException : " + pe.getMessage();
log.warn(message);
}
finally {
post.releaseConnection();
}
public String parseAltavistaResponse(String reponse) throws
FwkHttpParseException {
String translation;
int pos = 0;
pos = reponse.indexOf("<form", pos);
if (pos == -1) {
throw new FwkHttpParseException("Impossible to find '<form' within the
response");
}
pos = reponse.indexOf("<div", pos);
if (pos == -1) {
throw new FwkHttpParseException("Impossible to find '<div' after <form
within the response");
}
pos = reponse.indexOf(">", pos);
if (pos == -1) {
throw new FwkHttpParseException("Impossible to find '>' after '<div'
within the response");
}
int pos2 = reponse.indexOf("</", pos);
if (pos2 == -1) {
throw new FwkHttpParseException("Impossible to find '</' after '<div...>'
within the response");
}
translation= reponse.substring(pos + 1, reponse.indexOf("</", pos2));
return translation;
}
I also try to add header to th request like a logical referer but with no
interesting result.
Please help!!!
Franck