You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Arun Kaundal <ar...@gmail.com> on 2005/12/05 13:57:03 UTC

fetch of file:///F:/xxx/xxx/xxx.txt failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

  I am getting protocol not found error. What configuartionsetting require
for my case. Plz come up with solution soon, I am waiting my posting from
long time.

Log is attached.
051205 181723 logging at INFO
051205 181723 fetching
file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
051205 181723 fetching
file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
051205 181723 fetch of
file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file
051205 181723 fetch of
file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file
051205 181723 Could not clean the content-type [], Reason is [
org.apache.nutch.util.mime.MimeTypeException: The type can not be null or
empty]. Using its raw version...
051205 181723 Could not clean the content-type [], Reason is [
org.apache.nutch.util.mime.MimeTypeException: The type can not be null or
empty]. Using its raw version...
051205 181723 Parsing [
file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt] with [
org.apache.nutch.parse.text.TextParser@1f6ba0f]
051205 181723 Parsing [
file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html] with [
org.apache.nutch.parse.text.TextParser@1313906]

RE: fetch of file:///F:/xxx/xxx/xxx.txt failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

Posted by Jonathan Hoffman <jo...@gmail.com>.
This should help you:
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878
e6 

-----Original Message-----
From: Arun Kaundal [mailto:arun.kaundal@gmail.com] 
Sent: Monday, December 05, 2005 11:23 PM
To: nutch-user@lucene.apache.org
Subject: Re: fetch of file:///F:/xxx/xxx/xxx.txt failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

Jerome
  Thanx for replying. How can I activate protocol-file plugin. I am new to
nutch, plz suggest some way . thanx a ton once again


On 12/5/05, Jérôme Charron <je...@gmail.com> wrote:
>
> It seems that you are trying to fetch some local files, but that the 
> protocol-file plugin is not activated in your configuration.
>
> Regards
>
> Jérôme
>
>
> On 12/5/05, Arun Kaundal <ar...@gmail.com> wrote:
> >
> >   I am getting protocol not found error. What configuartionsetting
> require
> > for my case. Plz come up with solution soon, I am waiting my posting
> from
> > long time.
> >
> > Log is attached.
> > 051205 181723 logging at INFO
> > 051205 181723 fetching
> > file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> > 051205 181723 fetching
> > file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
> > 051205 181723 fetch of
> file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
> > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol 
> > not found for url=file
> > 051205 181723 fetch of
> file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol 
> > not found for url=file
> > 051205 181723 Could not clean the content-type [], Reason is [
> > org.apache.nutch.util.mime.MimeTypeException: The type can not be 
> > null
> or
> > empty]. Using its raw version...
> > 051205 181723 Could not clean the content-type [], Reason is [
> > org.apache.nutch.util.mime.MimeTypeException: The type can not be 
> > null
> or
> > empty]. Using its raw version...
> > 051205 181723 Parsing [
> file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt]
> > with [org.apache.nutch.parse.text.TextParser@1f6ba0f]
> > 051205 181723 Parsing [
> file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> > ] with [org.apache.nutch.parse.text.TextParser@1313906]
> >
> >
>
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
>


Re: fetch of file:///F:/xxx/xxx/xxx.txt failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

Posted by Arun Kaundal <ar...@gmail.com>.
Jerome
  Thanx for replying. How can I activate protocol-file plugin. I am new to
nutch, plz suggest some way . thanx a ton once again


On 12/5/05, Jérôme Charron <je...@gmail.com> wrote:
>
> It seems that you are trying to fetch some local files, but that the
> protocol-file plugin is not activated in your configuration.
>
> Regards
>
> Jérôme
>
>
> On 12/5/05, Arun Kaundal <ar...@gmail.com> wrote:
> >
> >   I am getting protocol not found error. What configuartionsetting
> require
> > for my case. Plz come up with solution soon, I am waiting my posting
> from
> > long time.
> >
> > Log is attached.
> > 051205 181723 logging at INFO
> > 051205 181723 fetching
> > file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> > 051205 181723 fetching
> > file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
> > 051205 181723 fetch of
> file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
> > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not
> > found for url=file
> > 051205 181723 fetch of
> file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not
> > found for url=file
> > 051205 181723 Could not clean the content-type [], Reason is [
> > org.apache.nutch.util.mime.MimeTypeException: The type can not be null
> or
> > empty]. Using its raw version...
> > 051205 181723 Could not clean the content-type [], Reason is [
> > org.apache.nutch.util.mime.MimeTypeException: The type can not be null
> or
> > empty]. Using its raw version...
> > 051205 181723 Parsing [
> file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt]
> > with [org.apache.nutch.parse.text.TextParser@1f6ba0f]
> > 051205 181723 Parsing [
> file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> > ] with [org.apache.nutch.parse.text.TextParser@1313906]
> >
> >
>
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
>

Re: fetch of file:///F:/xxx/xxx/xxx.txt failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

Posted by Jérôme Charron <je...@gmail.com>.
It seems that you are trying to fetch some local files, but that the
protocol-file plugin is not activated in your configuration.

Regards

Jérôme


On 12/5/05, Arun Kaundal <ar...@gmail.com> wrote:
>
>   I am getting protocol not found error. What configuartionsetting require
> for my case. Plz come up with solution soon, I am waiting my posting from
> long time.
>
> Log is attached.
> 051205 181723 logging at INFO
> 051205 181723 fetching
> file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> 051205 181723 fetching
> file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
> 051205 181723 fetch of file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt
> failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not
> found for url=file
> 051205 181723 fetch of file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not
> found for url=file
> 051205 181723 Could not clean the content-type [], Reason is [
> org.apache.nutch.util.mime.MimeTypeException: The type can not be null or
> empty]. Using its raw version...
> 051205 181723 Could not clean the content-type [], Reason is [
> org.apache.nutch.util.mime.MimeTypeException: The type can not be null or
> empty]. Using its raw version...
> 051205 181723 Parsing [file:///F:/Atalntis_scheduler/Crawl_Files/Voltix_4n_network.txt]
> with [org.apache.nutch.parse.text.TextParser@1f6ba0f]
> 051205 181723 Parsing [file:///F:/Atalntis_scheduler/Crawl_Files/FetcherTask.html
> ] with [org.apache.nutch.parse.text.TextParser@1313906]
>
>


--
http://motrech.free.fr/
http://www.frutch.org/

Re: fetch of file:///F:/xxx/xxx/xxx.txt failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

Posted by Arun Kaundal <ar...@gmail.com>.
I am unable to understand, what u want to say. Is it possible for u to send
me any configuration onm the form of attachment.
with Thanx


On 12/8/05, Hasan Diwan <ha...@gmail.com> wrote:
>
>
> On Dec 5, 2005, at 4:57 AM, Arun Kaundal wrote:
>
> >   I am getting protocol not found error. What configuartionsetting
> > require for my case. Plz come up with solution soon, I am waiting
> > my posting from long time.
> In your crawl-filter.txt:
> -^(file|ftp|mailto): # remove the word file, leaving
>                              # -^(ftp|mailto):
> -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|
> tgz|mov|MOV|e
> xe|png|PNG)$
> -[?*!@=]
> +^http*://([a-z0-9]*\.)*/
> +^https*://([a-z0-9]*\.)*/
> +^file:///* # Add this
> -.
>
>
> Cheers,
> Hasan Diwan <ha...@gmail.com>
>
>
>
>

Re: fetch of file:///F:/xxx/xxx/xxx.txt failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

Posted by Hasan Diwan <ha...@gmail.com>.
On Dec 5, 2005, at 4:57 AM, Arun Kaundal wrote:

>   I am getting protocol not found error. What configuartionsetting  
> require for my case. Plz come up with solution soon, I am waiting  
> my posting from long time.
In your crawl-filter.txt:
-^(file|ftp|mailto): # remove the word file, leaving
			      # -^(ftp|mailto):
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm| 
tgz|mov|MOV|e
xe|png|PNG)$
-[?*!@=]
+^http*://([a-z0-9]*\.)*/
+^https*://([a-z0-9]*\.)*/
+^file:///* # Add this
-.


Cheers,
Hasan Diwan <ha...@gmail.com>