You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Tarik Yilmaz (JIRA)" <ji...@apache.org> on 2014/12/23 16:13:13 UTC
[jira] [Created] (HTTPCLIENT-1590) Chatset detection problem if
Content-Type header is text/html
Tarik Yilmaz created HTTPCLIENT-1590:
----------------------------------------
Summary: Chatset detection problem if Content-Type header is text/html
Key: HTTPCLIENT-1590
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1590
Project: HttpComponents HttpClient
Issue Type: Bug
Affects Versions: 4.3.6
Reporter: Tarik Yilmaz
Priority: Critical
{code}
HttpClient client = HttpClients.createDefault();
HttpEntity entitiy = client.execute(new HttpGet(url)).getEntity();
String charset = ContentType.get(entity).getCharset().displayName();
{code}
third line throw an NullPointerException.
Response headers :
{code}
Cache-Control:private
Content-Encoding:gzip
Content-Length:16636
Content-Type:text/html
Date:Tue, 23 Dec 2014 14:06:13 GMT
Server:Microsoft-IIS/7.0
X-Powered-By:ASP.NET
{code}
Response meta tag :
{code}
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns:fb="http://ogp.me/ns/fb#">
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-9" />
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1254" />
<link rel="SHORTCUT ICON" href="/favicon.ico" />
{code}
How can I receive real charset from DOM object. I am using Jsoup for parse document with Jsoup.parse(InputStream, String, String) method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org