You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Hakka Ville <vh...@gmail.com> on 2006/11/18 13:44:55 UTC

how to detect charset encoding from "meta http-equiv" ?

Dear Sirs,

I tried to use httpclient, server doesn't set encoding within http response
header, but does in the page itself with "meta http-equiv". How can I tell
httpclient to detect (cyrillic) encoding from that thing ?

Cheers,
Hakka

Re: HttpClient hung on remote connection.

Posted by Roland Weber <RO...@de.ibm.com>.
Hello Francis,

this sounds very much as if your connections are not released properly.
Make sure they are released in a finally {} clause, even in case of an
exception being thrown.

hope that helps,
  Roland


HttpClient hung on remote connection.

Posted by "Li, Francis" <Fr...@FMR.COM>.
Hi there, 

We have problem with HttpClient 3.0 or 3.01, we use
MultiThreadedHttpConnectionManager() for connection pool of 50.
Timeout is set to 60 Second and we are running application on websphere
6.02. sometime we have web container thread hung detected from WAS for
more than 600 Seconds. From our trace code after we call executeMethod
is our last call, it won't come back from httpclient...

It will end up hang all the work threads. Then we have to restart the
AppServer, we don't have this issue with httpClient 2.0.

Any suggestion and advice?

thanks in advance

--Francis.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: how to detect charset encoding from "meta http-equiv" ?

Posted by Julius Davies <ju...@cucbc.com>.
Hi, Hakka,

According to what I remember of the HTML spec, the first parts of the HTML content (<html><head><meta>...) should all be basic ascii (bytes 0 - 127).  So you can try reading the first KB or so until you encounter the <meta> tag.

Then you'll have to re-read with the encoding you've extracted!

I think almost every known encoding supports the lower half of the ascii chart (0 - 127).  It's only when the first bit of the character is a 1 when things get exciting.

Good luck!

You'll probably need to support all combinations of lower-case and upper-case (since all are possible in HTML 4):

<meta>
<metA>
<meTa>
<meTA>
<mEta>
<mEtA>
<mETa>
<mETA>
<Meta>
<MetA>
<MeTa>
<MeTA>
<MEta>
<MEtA>
<METa>
<META>

Maybe it's best just to convert whatever you find all to lowercase before trying to extract the "http equiv".


yours,

Julius

http://juliusdavies.ca/

-----Original Message-----
From:	Hakka Ville [mailto:vhakka@gmail.com]
Sent:	Sat 11/18/2006 4:44 AM
To:	httpclient-user@jakarta.apache.org
Cc:	
Subject:	how to detect charset encoding from "meta http-equiv" ?

Dear Sirs,

I tried to use httpclient, server doesn't set encoding within http response
header, but does in the page itself with "meta http-equiv". How can I tell
httpclient to detect (cyrillic) encoding from that thing ?

Cheers,
Hakka




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: how to detect charset encoding from "meta http-equiv" ?

Posted by Roland Weber <ht...@dubioso.net>.
Hello Hakka,

> I tried to use httpclient, server doesn't set encoding within http response
> header, but does in the page itself with "meta http-equiv". How can I tell
> httpclient to detect (cyrillic) encoding from that thing ?

You can't. HttpClient will *never* consider the content it transports.
Start reading the response stream until you've got the encoding, then
create an InputStreamReader with that encoding for the rest of the data.
If the size of the page is small, you can also buffer the binary version
in memory and create the InputStreamReader on top of a ByteArryInputStream
to read the whole document.

hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org