You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org> on 2007/09/21 19:01:08 UTC

[jira] Created: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

URI does not handle non-ASCII characters in host names correctly
----------------------------------------------------------------

                 Key: HTTPCLIENT-691
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient
            Reporter: Christopher Sahnwaldt
            Priority: Minor


URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
System.out.println(uri.getHost());
System.out.println(uri.getPath());
System.out.println(uri.getQuery());

prints

www.eisb?r.de
/eisbär
eis=bär


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529508 ] 

Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------

When I posted the description, JIRA apparently noticed that the string was a URL... let's try again:

URI uri = new URI("http"+"://bär.de/bär?bär=eis", false);
System.out.println(uri.getHost());
System.out.println(uri.getPath());
System.out.println(uri.getQuery());
System.out.println(uri.isHostname());

b?r.de
/bär
bär=eis
false


> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
>                 Key: HTTPCLIENT-691
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Christopher Sahnwaldt
>            Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531178 ] 

Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------

Thanks for the reply! I'm looking forward to the 4.0 release, and meanwhile I'm happy with 3.x. Thanks for all your efforts!

> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
>                 Key: HTTPCLIENT-691
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Christopher Sahnwaldt
>            Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529525 ] 

Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------

With another small change in HttpHost, I managed to invoke a GetMethod:

line 107 of current SVN version:

this(uri.getHost(), uri.getPort(), Protocol.getProtocol(uri.getScheme()));

change to

this(new String(uri.getRawHost()), uri.getPort(), Protocol.getProtocol(uri.getScheme()));

This is just a hack though; for a clean solution a method getEscapedHost() should be added to URI.

When I tested this, HttpClient tried to create a Socket and got an UnknownHostException for hosts with non-ASCII characters. I don't know if that is Java's fault, or maybe our DNS is not configured correctly, but at least that's the same result I get with java.net.URL.openStream() :-)


> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
>                 Key: HTTPCLIENT-691
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Christopher Sahnwaldt
>            Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Closed: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

Posted by "Roland Weber (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Weber closed HTTPCLIENT-691.
-----------------------------------

    Resolution: Won't Fix

Hello Christopher,

I'm sorry, but we're not going to hack IDN support into 3.x anymore. It is stable and final. We'll fix bugs, but there will be no new features added to the 3.x codebase.
As you have noticed, the URI class is a mess. You don't know whether your changes break something somewhere else, and neither do we. Changing HttpHost to use getRawHost instead of getHost is almost sure to break other people's applications. You should apply such patches to your local copy, that's the beauty of open source software.

HttpClient 4.0 is using the Java URI class, though it's not ready for production use yet. All our efforts go into the new codebase that will make HttpClient 3.1 obsolete.

cheers,
  Roland


> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
>                 Key: HTTPCLIENT-691
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Christopher Sahnwaldt
>            Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529511 ] 

Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------

I added another few lines that may help to clarify the issue:

URI uri = new URI("http"+"://bär.de/bär?bär=eis", false);
System.out.println(uri.getHost());
System.out.println(uri.getRawHost());
System.out.println(uri.getPath());
System.out.println(uri.getRawPath());
System.out.println(uri.getQuery());
System.out.println(uri.getRawQuery());
System.out.println(uri.isHostname());

prints

b?r.de
bär.de
/bär
/b%C3%A4r
bär=eis
b%C3%A4r=eis
false

For path and query, the 'raw' version is properly escaped, the other version is unescaped. For host, the 'raw' version is *not* escaped, but getHost() still tries to unescape it and messes things up.


> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
>                 Key: HTTPCLIENT-691
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Christopher Sahnwaldt
>            Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-691) URI does not handle non-ASCII characters in host names correctly

Posted by "Christopher Sahnwaldt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529523 ] 

Christopher Sahnwaldt commented on HTTPCLIENT-691:
--------------------------------------------------

A small change in URI.parseAuthority() fixes the problem (and hopefully does not cause others...)

Here are lines 2215 and 2216 (current SVN version of URI):

// REMINDME: it doesn't need the pre-validation
_host = original.substring(from, next).toCharArray();

Change these lines to

// REMINDME: it doesn't need the pre-validation
_host = (escaped) ? original.substring(from, next).toCharArray()
    : encode(original.substring(from, next), hostname, charset);

and the problem is almost fixed:

bär.de
b%C3%A4r.de
/bär
/b%C3%A4r
bär=eis
b%C3%A4r=eis
false


The raw hostname is now escaped, but URI.isHostname() is still false. Another small change fixes that.

Here are lines 1117 to 1119:

hostname.or(toplabel);
// hostname.or(domainlabel);
hostname.set('.');

Add another line so that % is also valid in hostnames:

hostname.or(toplabel);
// hostname.or(domainlabel);
hostname.set('.');
hostname.set('%');

With these changes, the test prints

bär.de
b%C3%A4r.de
/bär
/b%C3%A4r
bär=eis
b%C3%A4r=eis
true



> URI does not handle non-ASCII characters in host names correctly
> ----------------------------------------------------------------
>
>                 Key: HTTPCLIENT-691
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-691
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>            Reporter: Christopher Sahnwaldt
>            Priority: Minor
>
> URI uri = new URI("http://www.eisbär.de/eisbär?eis=bär", false);
> System.out.println(uri.getHost());
> System.out.println(uri.getPath());
> System.out.println(uri.getQuery());
> prints
> www.eisb?r.de
> /eisbär
> eis=bär

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org