You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nisha Aggarwal <Ni...@infosys.com> on 2008/08/19 08:26:42 UTC

Regarding --- Error: INVALID URI--- Escaped absolute path not valid

Hi,

I am crawling a site in which i am getting the following error.
And when i am opening this particular document am able to access its contents.

I am getting the following  info in log file.

2008-08-19 09:37:20,609 INFO  fetcher.Fetcher - fetching http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc
2008-08-19 09:37:21,593 DEBUG httpclient.Http - Pre-configured credentials with scope - host: finacleportal; port: 80; found for url: http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc
2008-08-19 09:37:21,593 ERROR httpclient.Http - java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc': escaped absolute path not valid
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:219)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:80)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:145)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:219)
2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
2008-08-19 09:37:21,593 INFO  fetcher.Fetcher - fetch of http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc failed with: java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc': escaped absolute path not valid

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2008-08-19 09:53:08,609 INFO  fetcher.Fetcher - fetching http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm
2008-08-19 09:53:09,609 DEBUG httpclient.Http - Pre-configured credentials with scope - host: finacleportal; port: 80; found for url: http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm
2008-08-19 09:53:09,609 ERROR httpclient.Http - java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm': escaped absolute path not valid
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:219)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:80)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:145)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:219)
2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
2008-08-19 09:53:09,609 INFO  fetcher.Fetcher - fetch of http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm failed with: java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm': escaped absolute path not valid


The same error is there with many more files.
Please guide me..


Regards

Nisha Aggarwal


**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***