You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by J S <ve...@hotmail.com> on 2005/06/13 10:15:49 UTC

errors reported for working pages

Hi,

Can anyone help me with the following problem. In my crawl.log I'm getting 
lots of messages such as those below. However if I test the URLs in my  
browser, they're fine. Is there a regular expression I need to update 
somewhere e.g. One of the URLs below has a space in it. So I was thinking I 
might need to change or add a line in crawl-urlfilter.txt ?


fetch of 
http://planetbp.bp.com/general/aptrix/bani.nsf/Content/XXXXPS%5FMB%5F090605%5CXXXXps%5FManagement+Briefing%5F090605
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400

fetch of http://planetbp.bp.com/general/aptrix/aptrix.nsf/Content/BP 
websites failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 
400


fetch of 
http://planetbp.bp.com/general/aptrix/aptcsops.nsf/Content/GoHi+Services+Home%5CSocial 
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400


fetch of 
http://planetbp.bp.com/general/aptrix/aptppl.nsf/Content/Training+Home%5CBusiness+Tools%5CPatrol+Medical 
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 500