You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Rum Raisin <ru...@yahoo.com> on 2011/11/12 20:38:40 UTC

Input path does not exist (parse_data)

I get this error running nutch trunk under eclipse...
I don't understand what the problem is. It already created other directories like...


Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112043120/parse_data
Input path does not exist: file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112042823/parse_data
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

Re: Input path does not exist (parse_data)

Posted by Lewis John Mcgibbney <le...@gmail.com>.

By the looks of it there was a problem parsing segment data in this
particular segment. Please try reparsing the segment.

On Sat, Nov 12, 2011 at 11:46 AM, Rum Raisin <ru...@yahoo.com> wrote:

> Sorry continuing, since yahoo keyboard shortcuts triggered premature
> email...
>
> It already created other directories like... with directories like
> crawl_generate under them below. But why does it give this error? It
> couldn't create the parse_data file earlier that its expecting now? Or it
> thinks there should be data in that directory but there's nothing there?
>
> /nutch-trunk/crawl/segments/20111112043249
> /nutch-trunk/crawl/segments/20111112043120
> /nutch-trunk/crawl/segments/20111112043717
> /nutch-trunk/crawl/segments/20111112042823
> /nutch-trunk/crawl/segments/20111112043256
>
>
> ________________________________
> From: Rum Raisin <ru...@yahoo.com>
> To: "user@nutch.apache.org" <us...@nutch.apache.org>
> Sent: Saturday, November 12, 2011 11:38 AM
> Subject: Input path does not exist (parse_data)
>
> I get this error running nutch trunk under eclipse...
> I don't understand what the problem is. It already created other
> directories like...
>
>
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112043120/parse_data
> Input path does not exist:
> file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112042823/parse_data
> at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
> at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
> at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>



-- 
*Lewis*

Re: Input path does not exist (parse_data)

Posted by Rum Raisin <ru...@yahoo.com>.

Sorry continuing, since yahoo keyboard shortcuts triggered premature email...

It already created other directories like... with directories like crawl_generate under them below. But why does it give this error? It couldn't create the parse_data file earlier that its expecting now? Or it thinks there should be data in that directory but there's nothing there?

/nutch-trunk/crawl/segments/20111112043249
/nutch-trunk/crawl/segments/20111112043120
/nutch-trunk/crawl/segments/20111112043717
/nutch-trunk/crawl/segments/20111112042823
/nutch-trunk/crawl/segments/20111112043256


________________________________
From: Rum Raisin <ru...@yahoo.com>
To: "user@nutch.apache.org" <us...@nutch.apache.org>
Sent: Saturday, November 12, 2011 11:38 AM
Subject: Input path does not exist (parse_data)

I get this error running nutch trunk under eclipse...
I don't understand what the problem is. It already created other directories like...


Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112043120/parse_data
Input path does not exist: file:/home/jeff/workspace/nutch-trunk/crawl/segments/20111112042823/parse_data
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)