You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by bu...@apache.org on 2005/10/05 14:02:43 UTC

DO NOT REPLY [Bug 36932] New: - httpclient not able to download certain urls

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36932>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=36932

           Summary: httpclient not able to download certain urls
           Product: HttpClient
           Version: 3.0 RC2
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Commons HttpClient
        AssignedTo: httpclient-dev@jakarta.apache.org
        ReportedBy: hi_pkr@yahoo.com
                CC: hi_pkr@yahoo.com


Hi guys,

I was using nutch-0.7 to crawl one of the sites but for certain urls it gave 
following exception and hence failed for them:

java.lang.IllegalArgumentException: Invalid 
uri 'http://www.trw.com/suppliers/home/0,,5^1^5^5,00.html': escaped absolute 
path not valid
	at org.apache.commons.httpclient.HttpMethodBase.<init>
(HttpMethodBase.java:219)
	at org.apache.commons.httpclient.methods.GetMethod.<init>
(GetMethod.java:88)
	at org.apache.nutch.protocol.httpclient.HttpResponse.<init>
(HttpResponse.java:87)
	at org.apache.nutch.protocol.httpclient.Http.getProtocolOutput
(Http.java:204)
	at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)


Though this url opens perfectly in browser and its a valid existing url. 
Actually nutch-0.7 provides two http-protocol plugins, one built on java and 
other built on httpclient (commons-httpclient-3.0-rc2.jar). So plugin built on 
java is able to download that url but plugin based on httpclient throws that 
above exception. Is there a bug in httpclient or i am doing something wrong?

I will really appreciate if someone can throw some light on this matter ? 

TIA (Thanks in Advance)
Pushpesh

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org