You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/07/27 08:56:35 UTC

cygwin - Input path doesnt exist

I've freshly installed a nutch nightly build onto my laptop using an up-to-date cygwin.  Basically I just downloaded the .tar.gz, ran ant, and verified that $NUTCH_HOME/bin/nutch works (gives me the help screen).  I set up nutch-site.xml, urls.txt and attempted to crawl.  However, I get an exception in org.apache.hadoop.mapred.InvalidInputException.  The hadoop.log doesn't report the error, just the command line crawl command.  Anyone seen this before?


$ nutch crawl /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt -dir /cygdrive/c/nutch-2007-07-26_04-01-20/content /sf911truth -depth 3 -topN 200
crawl started in: /cygdrive/c/nutch-2007-07-26_04-01-20/content/sf911truth
rootUrlDir = /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
threads = 10
depth = 3
topN = 200
Injector: starting
Injector: crawlDb: /cygdrive/c/nutch-2007-07-26_04-01-20/content/sf911truth/crawldb
Injector: urlDir: /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
        at org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)





       
____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

Re: cygwin - Input path doesnt exist

Posted by feran <fe...@whereistand.com>.
This is the problem:

Injector: urlDir: /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt

urls.txt is not a Directory.

Crawl takes a Directory parameter, not the direct file. Inside the 
directory, it checks for a flat file with no extension called urls.

- feran_a
----- Original Message ----- 
From: "Kai_testing Middleton" <ka...@yahoo.com>
To: "nutch user" <nu...@lucene.apache.org>
Sent: Friday, July 27, 2007 2:56 AM
Subject: cygwin - Input path doesnt exist


I've freshly installed a nutch nightly build onto my laptop using an 
up-to-date cygwin.  Basically I just downloaded the .tar.gz, ran ant, and 
verified that $NUTCH_HOME/bin/nutch works (gives me the help screen).  I set 
up nutch-site.xml, urls.txt and attempted to crawl.  However, I get an 
exception in org.apache.hadoop.mapred.InvalidInputException.  The hadoop.log 
doesn't report the error, just the command line crawl command.  Anyone seen 
this before?


$ nutch crawl /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt -dir 
/cygdrive/c/nutch-2007-07-26_04-01-20/content /sf911truth -depth 3 -topN 200
crawl started in: /cygdrive/c/nutch-2007-07-26_04-01-20/content/sf911truth
rootUrlDir = /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
threads = 10
depth = 3
topN = 200
Injector: starting
Injector: crawlDb: 
/cygdrive/c/nutch-2007-07-26_04-01-20/content/sf911truth/crawldb
Injector: urlDir: /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
Input path doesnt exist : 
/cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
        at 
org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)






____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php