You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Arun Kaundal <ar...@gmail.com> on 2005/12/05 06:56:47 UTC
org.apache.nutch.protocol.ProtocolNotFound: protocol
Nutch Geeks-
I am facing problem with parsing, as protocol for parsing of particular
type of file is not found. How do I parse the content of those files?
What configuration changes are require (if any ) or Is it problem with
particular library . Please send your reply asap. Thanx a ton
Complete log is attched in errorlog.txt file
051205 104856 fetching file:///F:/atsd/Crawl_Files/v4n.txt
051205 104856 fetching file:///F:/atsd/Crawl_Files/FetcherTask.html
051205 104856 fetch of file:///F:/atsd/Crawl_Files/FetcherTask.html failed
with: org.apache.nutch.protocol.ProtocolNotFound:
protocol not found for url=file
051205 104856 fetch of file:///F:/atsd/Crawl_Files/v4n.txt failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol
not found for url=file
051205 104856 Unable to parse [null].Reason is [
java.net.MalformedURLException]
051205 104856 Could not clean the content-type [], Reason is [
org.apache.nutch.util.mime.MimeTypeException: The type can not
be null or empty]. Using its raw version...
051205 104856 Could not clean the content-type [], Reason is [
org.apache.nutch.util.mime.MimeTypeException: The type can not
be null or empty]. Using its raw version...
java.lang.NullPointerException
at org.apache.nutch.parse.ParserFactory.findExtensions(
ParserFactory.java:280)
at org.apache.nutch.parse.ParserFactory.getExtensions(
ParserFactory.java:254)
at org.apache.nutch.parse.ParserFactory.getParsers(
ParserFactory.java:149)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:58)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(
Fetcher.java:252)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java
:204)
java.lang.NullPointerException
at org.apache.nutch.parse.ParserFactory.findExtensions(
ParserFactory.java:280)
at org.apache.nutch.parse.ParserFactory.getExtensions(
ParserFactory.java:254)
at org.apache.nutch.parse.ParserFactory.getParsers(
ParserFactory.java:149)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:58)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(
Fetcher.java:252)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java
:204)
051205 104856 fetch okay, but can't parse
file:///F:/atsd/Crawl_Files/v4n.txt, reason: failed(2,200):
java.lang.NullPointerException
051205 104856 fetch okay, but can't parse
file:///F:/atsd/Crawl_Files/FetcherTask.html, reason: failed(2,200):
java.lang.NullPointerException