You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nisha Aggarwal <Ni...@infosys.com> on 2008/08/19 08:26:42 UTC
Regarding --- Error: INVALID URI--- Escaped absolute path not valid
Hi,
I am crawling a site in which i am getting the following error.
And when i am opening this particular document am able to access its contents.
I am getting the following info in log file.
2008-08-19 09:37:20,609 INFO fetcher.Fetcher - fetching http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc
2008-08-19 09:37:21,593 DEBUG httpclient.Http - Pre-configured credentials with scope - host: finacleportal; port: 80; found for url: http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc
2008-08-19 09:37:21,593 ERROR httpclient.Http - java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc': escaped absolute path not valid
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:219)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:80)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:145)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:219)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
2008-08-19 09:37:21,593 INFO fetcher.Fetcher - fetch of http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc failed with: java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc': escaped absolute path not valid
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2008-08-19 09:53:08,609 INFO fetcher.Fetcher - fetching http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm
2008-08-19 09:53:09,609 DEBUG httpclient.Http - Pre-configured credentials with scope - host: finacleportal; port: 80; found for url: http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm
2008-08-19 09:53:09,609 ERROR httpclient.Http - java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm': escaped absolute path not valid
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:219)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:80)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:145)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:219)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
2008-08-19 09:53:09,609 INFO fetcher.Fetcher - fetch of http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm failed with: java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm': escaped absolute path not valid
The same error is there with many more files.
Please guide me..
Regards
Nisha Aggarwal
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
every reasonable precaution to minimize this risk, but is not liable for any damage
you may sustain as a result of any virus in this e-mail. You should carry out your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***