You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Mouad <el...@gmail.com> on 2010/02/11 05:17:40 UTC

error while crawling

Hello,
i installed Nutch on windows and everything went well until I wanted to
crawl a website.
I typed this line on the urls file that I created on nutch directory : echo
'http://dawahweb.net' > urls
I could not create a WebDB trying to type admin db -create
I received this log :
crawl started in: crawl-tinysite
rootUrlDir = urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: crawl-tinysite/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path doesnt exist : C:/cygwin/home/Mouad&Sibel/nutch-0.9/urls
	at
org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
	at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
	at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)

can anyone help please?

Mouad
-- 
View this message in context: http://old.nabble.com/error-while-crawling-tp27542153p27542153.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: error while crawling

Posted by reinhard schwab <re...@aon.at>.

nutch expect "urls" to be a directory.
create a directory "urls" and create in this directory a file called
like you want and
edit this file, add the urls you want to crawl.

Injector: urlDir: urls

Input path doesnt exist : C:/cygwin/home/Mouad&Sibel/nutch-0.9/urls


Mouad schrieb:
> Hello,
> i installed Nutch on windows and everything went well until I wanted to
> crawl a website.
> I typed this line on the urls file that I created on nutch directory : echo
> 'http://dawahweb.net' > urls
> I could not create a WebDB trying to type admin db -create
> I received this log :
> crawl started in: crawl-tinysite
> rootUrlDir = urls
> threads = 10
> depth = 1
> Injector: starting
> Injector: crawlDb: crawl-tinysite/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path doesnt exist : C:/cygwin/home/Mouad&Sibel/nutch-0.9/urls
> 	at
> org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
> 	at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
> 	at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
>
> can anyone help please?
>
> Mouad
>