You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Noah Levitt (JIRA)" <ji...@apache.org> on 2012/10/02 02:25:08 UTC

[jira] [Commented] (HTTPCLIENT-900) Don't enforce URI syntax

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467356#comment-13467356 ] 

Noah Levitt commented on HTTPCLIENT-900:
----------------------------------------

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4939847
"technically-illegal-but-usually-functional URIs appear all over... any moderately sized crawl of the web will encounter hundreds of such URIs. Other dominant net software, such as web browsers, already tolerate such URIs. So to match commonly expected behavior, and support real-world net applications"... can't use java.net.URI.

The use of java.net.URI is preventing heritrix, the open source web crawler, from moving to httpclient. Using httpcore-only is an interesting idea that I will look into. But I already have code for heritrix to use httpclient so I'd like to prepare a patch for this issue.

> HttpClient 3.x codeline has its own URI implementation, which has been the single largest source of issues/ bugs. I am, for one, very reluctant to repeat the same mistake.

To address this, org.apache.http.URI could simply wrap a java.net.URI, carrying along its validation. The key difference would be that it not be final, so users of the library such as heritrix could override that implementation.
                
> Don't enforce URI syntax
> ------------------------
>
>                 Key: HTTPCLIENT-900
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-900
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0 Final
>            Reporter: Marko Asplund
>            Priority: Minor
>             Fix For: Future
>
>
> I'm trying to use HttpComponents Client for fetching data from a web site.
> I've ran into problems that seem to be related with the way the request URL query parameters are handled on the server side.
> The service doesn't encode unsafe characters (e.g. '{' and '}') in response URLs.
> Also when these characters are encoded on the client prior to issuing the request the service gives incorrect responses.
> The URLs are of the following form:
> http://www.foo.bar/foobar?${APPL}=hetekaue
> On the otherhand HC Client doesn't allow me to send requests with invalid query syntax
> (HttpGet(String) constructor throws an URISyntaxException).
> It would be good if HC Client could be used also in situations like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org