You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Varela Santoalla <dv...@ecmwf.int> on 2006/07/12 19:31:44 UTC
Error running intranet crawl with 0.8.0-dev
Hello
I'm having this problem when trying to test intranet crawling . Could
anybody help with this. I've trying a lot of things but without results.
0.7.2 works fine, but I wanted to test the new version.
daniel@amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/
-threads 5 -depth 10 -topN 1000 -dir crawl_results
Exception in thread "main" java.io.IOException: Input directory
/var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in
local is invalid.
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
--
Daniel Varela Santoalla
European Centre for Medium-Range Weather Forecasts (ECMWF)
Re: Error running intranet crawl with 0.8.0-dev
Posted by manish kothari <ma...@universeinfosys.com>.
Daniel Varela Santoalla <dvarela <at> ecmwf.int> writes:
> Hello Daniel
as i saw ur problem,if u set env variable properly ,plz check out is
u set NUTH_HOME properly,and before plz read shell script of nutch (bin/nutch).i
think now ur problem will solve out.If its not working u can send me mail.
bye
>
Hello
>
> I'm having this problem when trying to test intranet crawling . Could
> anybody help with this. I've trying a lot of things but without results.
> 0.7.2 works fine, but I wanted to test the new version.
>
> daniel <at> amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/
> -threads 5 -depth 10 -topN 1000 -dir crawl_results
> Exception in thread "main" java.io.IOException: Input directory
> /var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in
> local is invalid.
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>
Re: Error running intranet crawl with 0.8.0-dev
Posted by Sami Siren <ss...@gmail.com>.
This error was fixed today in trunk so it will be available in the next
nightly build.
--
Sami Siren
Daniel Varela Santoalla wrote:
> Hello
>
> I'm having this problem when trying to test intranet crawling . Could
> anybody help with this. I've trying a lot of things but without
> results. 0.7.2 works fine, but I wanted to test the new version.
>
> daniel@amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/
> -threads 5 -depth 10 -topN 1000 -dir crawl_results
> Exception in thread "main" java.io.IOException: Input directory
> /var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current
> in local is invalid.
> at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>
Re: Error running intranet crawl with 0.8.0-dev
Posted by Zaheed Haque <za...@gmail.com>.
Hi:
Create a directory - "crawldb"
then create a sub directory "current" under "crawldb"
then run your bin/nutch inject crawldb urldir.
The latest SVN version already fix the problem but It was committed today.
Cheers
On 7/12/06, Daniel Varela Santoalla <dv...@ecmwf.int> wrote:
> Hello
>
> I'm having this problem when trying to test intranet crawling . Could
> anybody help with this. I've trying a lot of things but without results.
> 0.7.2 works fine, but I wanted to test the new version.
>
> daniel@amorgen:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/
> -threads 5 -depth 10 -topN 1000 -dir crawl_results
> Exception in thread "main" java.io.IOException: Input directory
> /var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in
> local is invalid.
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>
> --
>
> Daniel Varela Santoalla
> European Centre for Medium-Range Weather Forecasts (ECMWF)
>