You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Fuad Efendi <fu...@efendi.ca> on 2005/08/15 21:41:57 UTC

MapRed - Injector - urlDir - Format?

Which parameter should I pass to Crawl? It should be directory
containing smth. in which format?
Thanks,
Fuad

RE: MapRed - Injector - urlDir - Format?

Posted by Fuad Efendi <fu...@efendi.ca>.
I downloaded code just a few hours ago... Windows XP, I have a Suse
Linux 9.3 on another PC but I am too lazy...

If nobody have such error under Linux - suppose I am wrong...

I run this inside Eclipse, J2SE 1.4.2_08, with classpath links to CONF
and directory containing Plugins. I need to check code, may be smth in
Windows (for instance, I am browsing a folder, and Crawl can't delete
files)...

Thanks


050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
java.io.IOException: File already
exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
	at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
	at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
	at
org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:94)
	at
org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:83)
	at org.apache.nutch.mapred.MapTask.run(MapTask.java:65)
	at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:65)
050815 162138  map 0%
java.io.IOException: Job failed!
	at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:309)
	at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:52)
	at org.tokenizer.crawl.Crawl.main(Crawl.java:112)
Exception in thread "main" 




-----Original Message-----
From: Doug Cutting [mailto:cutting@nutch.org] 
Sent: Monday, August 15, 2005 6:10 PM
To: nutch-dev@lucene.apache.org
Subject: Re: MapRed - Injector - urlDir - Format?


Fuad Efendi wrote:
> It works now, I pass a folder to Crawl containing plain text file with

> URLs. I am testing, and I pass single URL.
> 
> At some point I have:
> 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
> 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
> java.io.IOException: File already 
> exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
> 	at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
> 	at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)

That looks perhaps like an older version of the code.  Are you running 
recent code from the mapred branch?  Also, I have not yet tried the 
recent mapred code on Win32, only on linux.

Doug



Re: MapRed - Injector - urlDir - Format?

Posted by Doug Cutting <cu...@nutch.org>.
Fuad Efendi wrote:
> It works now, I pass a folder to Crawl containing plain text file with
> URLs. I am testing, and I pass single URL.
> 
> At some point I have:
> 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
> 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
> java.io.IOException: File already
> exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
> 	at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
> 	at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)

That looks perhaps like an older version of the code.  Are you running 
recent code from the mapred branch?  Also, I have not yet tried the 
recent mapred code on Win32, only on linux.

Doug

RE: MapRed - Injector - urlDir - Format?

Posted by Fuad Efendi <fu...@efendi.ca>.
Thanks,

It works now, I pass a folder to Crawl containing plain text file with
URLs. I am testing, and I pass single URL.

At some point I have:
050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
java.io.IOException: File already
exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
	at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
	at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)

Fuad

-----Original Message-----
From: Doug Cutting [mailto:cutting@nutch.org] 
Sent: Monday, August 15, 2005 4:30 PM
To: nutch-dev@lucene.apache.org
Subject: Re: MapRed - Injector - urlDir - Format?


Fuad Efendi wrote:
> Which parameter should I pass to Crawl? It should be directory 
> containing smth. in which format?

As before, inject takes a flat text files of urls, one per line.  If you

wish to inject DMOZ urls, there is now a utility main() that will 
convert the DMOZ file to such a file.

Doug



Re: MapRed - Injector - urlDir - Format?

Posted by Doug Cutting <cu...@nutch.org>.
Fuad Efendi wrote:
> Which parameter should I pass to Crawl? It should be directory
> containing smth. in which format?

As before, inject takes a flat text files of urls, one per line.  If you 
wish to inject DMOZ urls, there is now a utility main() that will 
convert the DMOZ file to such a file.

Doug