You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2008/09/02 23:00:05 UTC

invalid urls

Hi,
 
When I run a crawl on our intranet (which is run on a lotus notes domino server hence the stange urls), I get back a few error messages, most of them in the format below. 
 
fetch of http://planetba.baplc.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/VideoJavaScript/$FILE/)){this.addVariable( failed with: java.lang.IllegalArgumentException: Invalid uri 'http://planetba.baplc.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/VideoJavaScript/$FILE/)){this.addVariable(': escaped absolute path not valid
 
fetch of http://planetba.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home%5CPeople+%26+Training%5CAircraft+Maintenance+Training+%E2%80%93+A320+Single+Aisle+Family failed with: Http code=500, url=http://planetba.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home%5CPeople+%26+Training%5CAircraft+Maintenance+Training+%E2%80%93+A320+Single+Aisle+Family
 
Is there anything I can configure in Nutch to handle these without filtering them out as they do appear to be legitimate pages?
 
Thanks for any help.
 
Rgds,
 
Ed.
_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/