You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org> on 2007/09/21 19:01:08 UTC
[jira] Created: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
URI does not handle non-ASCII characters in host names correctly
----------------------------------------------------------------
Key: HTTPCLIENT-691
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient
Reporter: Christopher Sahnwaldt
Priority: Minor
URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
System.out.println(uri.getHost());
System.out.println(uri.getPath());
System.out.println(uri.getQuery());
prints
www.eisb?r.de
/eisbär
eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529508 ]
Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------
When I posted the description, JIRA apparently noticed that the string was a URL... let's try again:
URI uri = new URI("http"+"://bär.de/bär?bär=eis", false);
System.out.println(uri.getHost());
System.out.println(uri.getPath());
System.out.println(uri.getQuery());
System.out.println(uri.isHostname());
b?r.de
/bär
bär=eis
false
> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
> Key: HTTPCLIENT-691
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Christopher Sahnwaldt
> Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531178 ]
Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------
Thanks for the reply! I'm looking forward to the 4.0 release, and meanwhile I'm happy with 3.x. Thanks for all your efforts!
> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
> Key: HTTPCLIENT-691
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Christopher Sahnwaldt
> Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529525 ]
Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------
With another small change in HttpHost, I managed to invoke a GetMethod:
line 107 of current SVN version:
this(uri.getHost(), uri.getPort(), Protocol.getProtocol(uri.getScheme()));
change to
this(new String(uri.getRawHost()), uri.getPort(), Protocol.getProtocol(uri.getScheme()));
This is just a hack though; for a clean solution a method getEscapedHost() should be added to URI.
When I tested this, HttpClient tried to create a Socket and got an UnknownHostException for hosts with non-ASCII characters. I don't know if that is Java's fault, or maybe our DNS is not configured correctly, but at least that's the same result I get with java.net.URL.openStream() :-)
> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
> Key: HTTPCLIENT-691
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Christopher Sahnwaldt
> Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
[jira] Closed: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
Posted by "Roland Weber (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roland Weber closed HTTPCLIENT-691.
-----------------------------------
Resolution: Won't Fix
Hello Christopher,
I'm sorry, but we're not going to hack IDN support into 3.x anymore. It is stable and final. We'll fix bugs, but there will be no new features added to the 3.x codebase.
As you have noticed, the URI class is a mess. You don't know whether your changes break something somewhere else, and neither do we. Changing HttpHost to use getRawHost instead of getHost is almost sure to break other people's applications. You should apply such patches to your local copy, that's the beauty of open source software.
HttpClient 4.0 is using the Java URI class, though it's not ready for production use yet. All our efforts go into the new codebase that will make HttpClient 3.1 obsolete.
cheers,
Roland
> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
> Key: HTTPCLIENT-691
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Christopher Sahnwaldt
> Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529511 ]
Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------
I added another few lines that may help to clarify the issue:
URI uri = new URI("http"+"://bär.de/bär?bär=eis", false);
System.out.println(uri.getHost());
System.out.println(uri.getRawHost());
System.out.println(uri.getPath());
System.out.println(uri.getRawPath());
System.out.println(uri.getQuery());
System.out.println(uri.getRawQuery());
System.out.println(uri.isHostname());
prints
b?r.de
bär.de
/bär
/b%C3%A4r
bär=eis
b%C3%A4r=eis
false
For path and query, the 'raw' version is properly escaped, the other version is unescaped. For host, the 'raw' version is *not* escaped, but getHost() still tries to unescape it and messes things up.
> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
> Key: HTTPCLIENT-691
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Christopher Sahnwaldt
> Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII
characters in host names correctly
Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529523 ]
Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------
A small change in URI.parseAuthority() fixes the problem (and hopefully does not cause others...)
Here are lines 2215 and 2216 (current SVN version of URI):
// REMINDME: it doesn't need the pre-validation
_host = original.substring(from, next).toCharArray();
Change these lines to
// REMINDME: it doesn't need the pre-validation
_host = (escaped) ? original.substring(from, next).toCharArray()
: encode(original.substring(from, next), hostname, charset);
and the problem is almost fixed:
bär.de
b%C3%A4r.de
/bär
/b%C3%A4r
bär=eis
b%C3%A4r=eis
false
The raw hostname is now escaped, but URI.isHostname() is still false. Another small change fixes that.
Here are lines 1117 to 1119:
hostname.or(toplabel);
// hostname.or(domainlabel);
hostname.set('.');
Add another line so that % is also valid in hostnames:
hostname.or(toplabel);
// hostname.or(domainlabel);
hostname.set('.');
hostname.set('%');
With these changes, the test prints
bär.de
b%C3%A4r.de
/bär
/b%C3%A4r
bär=eis
b%C3%A4r=eis
true
> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
> Key: HTTPCLIENT-691
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Christopher Sahnwaldt
> Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org