You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Matthias Keller (Jira)" <ji...@apache.org> on 2019/11/12 14:47:00 UTC
[jira] [Created] (HTTPCLIENT-2029) URIBuilder cannot parse non-UTF8
URIs
Matthias Keller created HTTPCLIENT-2029:
-------------------------------------------
Summary: URIBuilder cannot parse non-UTF8 URIs
Key: HTTPCLIENT-2029
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2029
Project: HttpComponents HttpClient
Issue Type: Bug
Affects Versions: 4.5.10
Reporter: Matthias Keller
URIBuilder always parses a given URI using UTF-8. For example given the following URI that still uses latin1:
{color:#008000}http://host/?x=%E4
{color}
%E4 is an enoded "ä" character in latin1.
{color:#000080}new {color}URIBuilder({color:#008000}"http://host/?x=%E4"{color}).setCharset({color:#660e7a}ISO_8859_1{color}).getQueryParams().get({color:#0000ff}0{color}).getValue() outputs {color:#808080}"{color}{color:#808080}�{color}{color:#808080}"{color}
This is because the URIBuilder constructor already parses the given URI and the charset is at this time always null, thus UTF-8 is used.
Proposed fix:
Provide overloaded constructors that also allow to specify the charset; for example:
{code}
public URIBuilder(final String string, final Charset charset) throws URISyntaxException {
this.charset = charset;
digestURI(new URI(string));
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org