You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by niv seker <se...@gmail.com> on 2010/03/11 18:50:35 UTC

handling URISyntaxException

Hi,
I am using httpclient for an application that download pages of urls that
appear in RSS feeds.
Httpclient is using java.net.URI, which in many cases is too strict for the
world in which he lives in.
URIs such as:
http://gizmodo.com/5479150/youtube-pulls-the-original-rickroll-video-spurring-inevitable-wave-of-protest-rickrolls-[updated]
,
http://finance.yahoo.com/tech-ticker/wall-street's-memo-to-meredith-whitney:-you're-so-2009-399286.html?tickers=
^dji,^gspc,gs,dia,spy,xlf,skf
are considered illegal and causes URISyntaxException to be thrown when
trying to create URI out of them.
This kind of urls are in common use and can be successfully handled/visited
by popular web browsers.

I was thinking on using a method that encodes all of the chars that are not
encoded, but there will still be a problem if the uri redirects to a uri
that is not legal.
(something like the toURI method in
http://www.java2s.com/Code/Java/Network-Protocol/URIutilities.htm)

Is there any workaround for this issue?
Is there any kind of way to specify a URIResolver object that will help
httpClient handling this illegal urls?

(while searching for a solution I found a bug that someone opened to the
java guys, in which he asked them to create a more lose implementation of
URI:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4939847
Not really helpful, but he described the issue pretty well.)

Cheers,
Niv
**

Re: handling URISyntaxException

Posted by Oleg Kalnichevski <ol...@apache.org>.
niv seker wrote:
> Hi,
> I am using httpclient for an application that download pages of urls that
> appear in RSS feeds.
> Httpclient is using java.net.URI, which in many cases is too strict for the
> world in which he lives in.
> URIs such as:
> http://gizmodo.com/5479150/youtube-pulls-the-original-rickroll-video-spurring-inevitable-wave-of-protest-rickrolls-[updated]
> ,
> http://finance.yahoo.com/tech-ticker/wall-street's-memo-to-meredith-whitney:-you're-so-2009-399286.html?tickers=
> ^dji,^gspc,gs,dia,spy,xlf,skf
> are considered illegal and causes URISyntaxException to be thrown when
> trying to create URI out of them.
> This kind of urls are in common use and can be successfully handled/visited
> by popular web browsers.
> 

HttpClient 3.1 has its own URI API, which literally NO one was willing 
to work with and to maintain. This was the reason why it was not ported 
to 4.0 and replaced with the java URI implementation.


> I was thinking on using a method that encodes all of the chars that are not
> encoded, but there will still be a problem if the uri redirects to a uri
> that is not legal.
> (something like the toURI method in
> http://www.java2s.com/Code/Java/Network-Protocol/URIutilities.htm)
> 
> Is there any workaround for this issue?
> Is there any kind of way to specify a URIResolver object that will help
> httpClient handling this illegal urls?
> 

I think it would be possible to provide a custom parser / factory for 
URI objects, which is more lenient than the standard one. However, as 
far as I am concerned this is a very low priority issue. Having said all 
that, if such a feature comes as a contribution to the project, I will 
happily incorporate it into the official code line. There is already a 
JIRA ticket for this feature:

https://issues.apache.org/jira/browse/HTTPCLIENT-900

Oleg

> (while searching for a solution I found a bug that someone opened to the
> java guys, in which he asked them to create a more lose implementation of
> URI:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4939847
> Not really helpful, but he described the issue pretty well.)
> 
> Cheers,
> Niv
> **
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org