You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by bu...@apache.org on 2005/10/05 14:02:43 UTC
DO NOT REPLY [Bug 36932] New: -
httpclient not able to download certain urls
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36932>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=36932
Summary: httpclient not able to download certain urls
Product: HttpClient
Version: 3.0 RC2
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Commons HttpClient
AssignedTo: httpclient-dev@jakarta.apache.org
ReportedBy: hi_pkr@yahoo.com
CC: hi_pkr@yahoo.com
Hi guys,
I was using nutch-0.7 to crawl one of the sites but for certain urls it gave
following exception and hence failed for them:
java.lang.IllegalArgumentException: Invalid
uri 'http://www.trw.com/suppliers/home/0,,5^1^5^5,00.html': escaped absolute
path not valid
at org.apache.commons.httpclient.HttpMethodBase.<init>
(HttpMethodBase.java:219)
at org.apache.commons.httpclient.methods.GetMethod.<init>
(GetMethod.java:88)
at org.apache.nutch.protocol.httpclient.HttpResponse.<init>
(HttpResponse.java:87)
at org.apache.nutch.protocol.httpclient.Http.getProtocolOutput
(Http.java:204)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)
Though this url opens perfectly in browser and its a valid existing url.
Actually nutch-0.7 provides two http-protocol plugins, one built on java and
other built on httpclient (commons-httpclient-3.0-rc2.jar). So plugin built on
java is able to download that url but plugin based on httpclient throws that
above exception. Is there a bug in httpclient or i am doing something wrong?
I will really appreciate if someone can throw some light on this matter ?
TIA (Thanks in Advance)
Pushpesh
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org