You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by bikram <bi...@yahoo.com> on 2007/08/22 09:27:06 UTC

Re: WIN XP PRO -Djava.protocol* file:///c:/folder/ Crawling Parents

Hi Vadim B  

I am getting same error 

org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=smb

were u able to rectify this error...

if yes, can u please tell me what you did which cleared this error..

already posted here all the details...

http://www.nabble.com/Windows-Share-Crawling---searching-tf4277499.html#a12175266

I am using Linux not cygwin on windows

thanx
Bikram


Hi,

I am working on the same issue as you, So far I could crawl file:///C:/* but
i am stucked on the smb part. It looks to me that this plugin isn't working
properly so it needs to be fixed for the newer version of nutch.

The error I get differs a bit from yours it is:

2007-05-25 18:06:29,573 INFO  fetcher.Fetcher - fetching
smb://mobidick/test/
2007-05-25 18:06:29,573 INFO  fetcher.Fetcher - fetch of
smb://mobidick/test/ failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=smb

I will dive into the plugin-smb and try out to narrow the problem Maybe we
can work together to get a quick solution.



---SNIP---

# accept hosts in MY.DOMAIN.NAME
# Standart +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
+^file:///C:/Policies/ <<-- why you put it here it doesn't make sense
because the +^(file|smb) line above is already fitting so this will be
skipped 
---SNIP ---

---SNIP ---
2007-05-24 14:04:22,000 WARN  crawl.PartitionUrlByHost - Malformed URL:
'smb://sql1/Sales/DATA/' 
//did you cuoted the url or is it displayed in the logs like this? I dont
get this error 
---SNIP ---

try this  in package org.apache.nutch.crawl.Crawl

  public static void main(String args[]) throws Exception {
	  System.setProperty("java.protocol.handler.pkgs", "jcifs"); // new 
	  LOG.info("SMB Info: " +
System.getProperty("java.protocol.handler.pkgs")); //new 
	  LOG.info("SMB Info: " +  new
java.util.PropertyPermission("java.protocol.handler.pkgs","read,
write").toString());//new 
	  if (args.length < 1) {
      System.out.println
        ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN
N]");
      return;
    }
---SNIP---

check out this:
http://java.sun.com/developer/onlineTraining/protocolhandlers/





-- 
View this message in context: http://www.nabble.com/WIN-XP-PRO--Djava.protocol*-file%3A---c%3A-folder--Crawling-Parents-tf3809966.html#a12269503
Sent from the Nutch - User mailing list archive at Nabble.com.