You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Varela Santoalla <dv...@ecmwf.int> on 2006/07/12 19:31:44 UTC

Error running intranet crawl with 0.8.0-dev

Hello

I'm having this problem when trying to test intranet crawling . Could 
anybody help with this. I've trying a lot of things but without results. 
0.7.2 works fine, but I wanted to test the new version.

daniel@amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/ 
-threads 5 -depth 10 -topN 1000 -dir crawl_results
Exception in thread "main" java.io.IOException: Input directory 
/var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in 
local is invalid.
         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
         at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
         at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)

-- 

Daniel Varela Santoalla
European Centre for Medium-Range Weather Forecasts (ECMWF)

Re: Error running intranet crawl with 0.8.0-dev

Posted by manish kothari <ma...@universeinfosys.com>.
Daniel Varela Santoalla <dvarela <at> ecmwf.int> writes:

> Hello Daniel 
           as i saw ur problem,if u set env variable properly ,plz check out is
u set NUTH_HOME properly,and before plz read shell script of nutch (bin/nutch).i
think now ur problem will solve out.If its not working u can send me mail.

 bye   
> 
Hello
> 
> I'm having this problem when trying to test intranet crawling . Could 
> anybody help with this. I've trying a lot of things but without results. 
> 0.7.2 works fine, but I wanted to test the new version.
> 
> daniel <at> amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/ 
> -threads 5 -depth 10 -topN 1000 -dir crawl_results
> Exception in thread "main" java.io.IOException: Input directory 
> /var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in 
> local is invalid.
>          at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
>          at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
>          at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
>          at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> 





Re: Error running intranet crawl with 0.8.0-dev

Posted by Sami Siren <ss...@gmail.com>.
This error was fixed today in trunk so it will be available in the next 
nightly build.
--
 Sami Siren


Daniel Varela Santoalla wrote:

> Hello
>
> I'm having this problem when trying to test intranet crawling . Could 
> anybody help with this. I've trying a lot of things but without 
> results. 0.7.2 works fine, but I wanted to test the new version.
>
> daniel@amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/ 
> -threads 5 -depth 10 -topN 1000 -dir crawl_results
> Exception in thread "main" java.io.IOException: Input directory 
> /var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current 
> in local is invalid.
>         at 
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>


Re: Error running intranet crawl with 0.8.0-dev

Posted by Zaheed Haque <za...@gmail.com>.
Hi:

Create a directory - "crawldb"
then create a sub directory "current" under "crawldb"

then run your bin/nutch inject crawldb urldir.

 The latest SVN version already fix the problem but It was committed today.

Cheers

On 7/12/06, Daniel Varela Santoalla <dv...@ecmwf.int> wrote:
> Hello
>
> I'm having this problem when trying to test intranet crawling . Could
> anybody help with this. I've trying a lot of things but without results.
> 0.7.2 works fine, but I wanted to test the new version.
>
> daniel@amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/
> -threads 5 -depth 10 -topN 1000 -dir crawl_results
> Exception in thread "main" java.io.IOException: Input directory
> /var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in
> local is invalid.
>          at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
>          at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
>          at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
>          at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>
> --
>
> Daniel Varela Santoalla
> European Centre for Medium-Range Weather Forecasts (ECMWF)
>