You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Fuad Efendi <fu...@efendi.ca> on 2005/08/15 21:41:57 UTC
MapRed - Injector - urlDir - Format?
Which parameter should I pass to Crawl? It should be directory
containing smth. in which format?
Thanks,
Fuad
RE: MapRed - Injector - urlDir - Format?
Posted by Fuad Efendi <fu...@efendi.ca>.
I downloaded code just a few hours ago... Windows XP, I have a Suse
Linux 9.3 on another PC but I am too lazy...
If nobody have such error under Linux - suppose I am wrong...
I run this inside Eclipse, J2SE 1.4.2_08, with classpath links to CONF
and directory containing Plugins. I need to check code, may be smth in
Windows (for instance, I am browsing a folder, and Crawl can't delete
files)...
Thanks
050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
java.io.IOException: File already
exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
at
org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:94)
at
org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:83)
at org.apache.nutch.mapred.MapTask.run(MapTask.java:65)
at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:65)
050815 162138 map 0%
java.io.IOException: Job failed!
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:309)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:52)
at org.tokenizer.crawl.Crawl.main(Crawl.java:112)
Exception in thread "main"
-----Original Message-----
From: Doug Cutting [mailto:cutting@nutch.org]
Sent: Monday, August 15, 2005 6:10 PM
To: nutch-dev@lucene.apache.org
Subject: Re: MapRed - Injector - urlDir - Format?
Fuad Efendi wrote:
> It works now, I pass a folder to Crawl containing plain text file with
> URLs. I am testing, and I pass single URL.
>
> At some point I have:
> 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
> 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
> java.io.IOException: File already
> exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
> at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
> at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
That looks perhaps like an older version of the code. Are you running
recent code from the mapred branch? Also, I have not yet tried the
recent mapred code on Win32, only on linux.
Doug
Re: MapRed - Injector - urlDir - Format?
Posted by Doug Cutting <cu...@nutch.org>.
Fuad Efendi wrote:
> It works now, I pass a folder to Crawl containing plain text file with
> URLs. I am testing, and I pass single URL.
>
> At some point I have:
> 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
> 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
> java.io.IOException: File already
> exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
> at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
> at
> org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
That looks perhaps like an older version of the code. Are you running
recent code from the mapred branch? Also, I have not yet tried the
recent mapred code on Win32, only on linux.
Doug
RE: MapRed - Injector - urlDir - Format?
Posted by Fuad Efendi <fu...@efendi.ca>.
Thanks,
It works now, I pass a folder to Crawl containing plain text file with
URLs. I am testing, and I pass single URL.
At some point I have:
050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
java.io.IOException: File already
exists:\tmp\nutch\mapred\local\map_pel04v\part-0.out
at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
at
org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
Fuad
-----Original Message-----
From: Doug Cutting [mailto:cutting@nutch.org]
Sent: Monday, August 15, 2005 4:30 PM
To: nutch-dev@lucene.apache.org
Subject: Re: MapRed - Injector - urlDir - Format?
Fuad Efendi wrote:
> Which parameter should I pass to Crawl? It should be directory
> containing smth. in which format?
As before, inject takes a flat text files of urls, one per line. If you
wish to inject DMOZ urls, there is now a utility main() that will
convert the DMOZ file to such a file.
Doug
Re: MapRed - Injector - urlDir - Format?
Posted by Doug Cutting <cu...@nutch.org>.
Fuad Efendi wrote:
> Which parameter should I pass to Crawl? It should be directory
> containing smth. in which format?
As before, inject takes a flat text files of urls, one per line. If you
wish to inject DMOZ urls, there is now a utility main() that will
convert the DMOZ file to such a file.
Doug