You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Oleg Kalnichevski (JIRA)" <ji...@apache.org> on 2014/03/18 14:57:44 UTC

[jira] [Commented] (HTTPCLIENT-1486) Quirky Behavior in URIUtils leads to Improper Request Execution

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939227#comment-13939227 ] 

Oleg Kalnichevski commented on HTTPCLIENT-1486:
-----------------------------------------------

I do not think there is much we can do about it. The uri in question is represented internally by java.net.URI as

{noformat}
scheme = {java.lang.String@470}"http"
fragment = null
authority = {java.lang.String@426}"www.example.com:80somepath"
userInfo = null
host = null
port = -1
path = {java.lang.String@473}"/someresource.html"
query = null
schemeSpecificPart = {java.lang.String@474}"//www.example.com:80somepath/someresource.html"
hash = 0
decodedUserInfo = null
decodedAuthority = {java.lang.String@426}"www.example.com:80somepath"
decodedPath = null
decodedQuery = null
decodedFragment = null
decodedSchemeSpecificPart = null
string = {java.lang.String@392}"http://www.example.com:80somepath/someresource.html"
{noformat}

This is hardly surprising given that the uri is clearly malformed. 

The only option we have is implement a custom, more lenient URI parser. Leniency in URI parsing will come at the cost of having to parse every request URI twice. I am not sure it is worth it.

Oleg

> Quirky Behavior in URIUtils leads to Improper Request Execution
> ---------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1486
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1486
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.3.3
>            Reporter: William Porter
>            Priority: Minor
>
> While executing a HttpUriRequest with a ClosableHttpClient, malformed URIs can lead to HTTP requests being executed for unexpected resources.  The root issue is in the extractHost() method in URIUtils, and is demonstracted by the following example.
> {code:title=Main.java|borderStyle=solid}
> import java.io.IOException;
> import java.net.URI;
> import java.net.URISyntaxException;
> import org.apache.http.HttpHost;
> import org.apache.http.HttpResponse;
> import org.apache.http.client.ClientProtocolException;
> import org.apache.http.client.HttpClient;
> import org.apache.http.client.methods.HttpGet;
> import org.apache.http.client.methods.HttpUriRequest;
> import org.apache.http.client.utils.URIUtils;
> import org.apache.http.impl.client.HttpClientBuilder;
> import org.apache.log4j.BasicConfigurator;
> import org.junit.Assert;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> public class Main {
> 	
> 	private static final Logger LOG = LoggerFactory.getLogger(Main.class);
> 	public static void main(String [] args) {
> 		
> 		// Set up Log4J logging
> 		BasicConfigurator.configure();
> 		
> 		try {
> 			
> 			// The following is a strange URI string that is possibly a typo that
> 			// doesn't include the / between the authority and the 'intended' path
> 			final String strangeUriString = "http://www.example.com:80somepath/someresource.html";
> 			// Whereas it doesn't neccesarily seem like strange behavior to resolve the
> 			// host and port as www.example.com and 80 from the authority, it can have unintended
> 			// consequences at higher levels of indirection
> 			Assert.assertEquals(new HttpHost("www.example.com", 80), URIUtils.extractHost(new URI(strangeUriString)));
> 			
> 			// Now we construct a request with the strange URI String
> 			HttpUriRequest request = new HttpGet(strangeUriString);
> 			
> 			// We create a CloseableHttpClient to execute the request
> 			final HttpClientBuilder builder = HttpClientBuilder.create();
> 			HttpClient client = builder.build();
> 			
> 			// Here, the request is executed, but is actually a GET /someresource.html
> 			// on www.example.com:80 since part of the intended path was considered part 
> 			// of the authority by the URI class, but disregarded by URIUtils
> 			final HttpResponse response = client.execute(request);
> 			LOG.info("Response: {}", response.getStatusLine().toString());
> 			
> 			
> 		} catch (final URISyntaxException e) {
> 			LOG.error("UriSyntaxException: {}", e.getMessage());
> 		} catch (final ClientProtocolException e) {
> 			LOG.error("ClientProtocolException: {}", e.getMessage());
> 		} catch (final IOException e) {
> 			LOG.error("IOException: {}", e.getMessage());
> 		}
> 		
> 	}
> }
> {code}
> This bug may be introduced by the fix for https://issues.apache.org/jira/browse/HTTPCLIENT-1166.  It might be advantageous to throw an exception in this case rather than be lenient with the host and port parsing, but further discussion might be merited based on the comments in the aforementioned issue. 
> Here is some debug output to show the request is actually a GET /someresource.html
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "GET /someresource.html HTTP/1.1[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Host: www.example.com:80[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "User-Agent: Apache-HttpClient/4.3.3 (java 1.5)[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "[\r][\n]"



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org