You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by J S <ve...@hotmail.com> on 2005/06/13 10:15:49 UTC
errors reported for working pages
Hi,
Can anyone help me with the following problem. In my crawl.log I'm getting
lots of messages such as those below. However if I test the URLs in my
browser, they're fine. Is there a regular expression I need to update
somewhere e.g. One of the URLs below has a space in it. So I was thinking I
might need to change or add a line in crawl-urlfilter.txt ?
fetch of
http://planetbp.bp.com/general/aptrix/bani.nsf/Content/XXXXPS%5FMB%5F090605%5CXXXXps%5FManagement+Briefing%5F090605
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400
fetch of http://planetbp.bp.com/general/aptrix/aptrix.nsf/Content/BP
websites failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error:
400
fetch of
http://planetbp.bp.com/general/aptrix/aptcsops.nsf/Content/GoHi+Services+Home%5CSocial
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400
fetch of
http://planetbp.bp.com/general/aptrix/aptppl.nsf/Content/Training+Home%5CBusiness+Tools%5CPatrol+Medical
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 500