You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by carmmello <ca...@globo.com> on 2005/12/22 21:12:25 UTC
New Tutorial Needed
I have donwloaded the last Nutch nightly version, from 2005-12-18, and
tried to run it, as usual (using the crawl method). As a result, I got
the following error messages:
"051222 175202 parsing file:/usr/nutch-nightly/conf/nutch-site.xml
java.io.IOException: No input directories specified in: NutchConf:
nutch-default.xml , mapred-
default.xml , /tmp/nutch/mapred/local/localRunner/job_xom6lb.xml ,
nutch-site.xml
at org.apache.nutch.mapred.InputFormatBase.listFiles
(InputFormatBase.java:85)
at org.apache.nutch.mapred.InputFormatBase.getSplits
(InputFormatBase.java:95)
at org.apache.nutch.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:63)
051222 175203 map 0%
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
at org.apache.nutch.crawl.Injector.inject(Injector.java:102)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
[root@localhost nutch-nightly]# "
Also, reviewing some posts I came across the following statement:
"The next version will be map reduce based in any case. So 0.7 is
already the 'old' one and people will not continue to develop it (may
just some maintenance releases).Map reduce doesn't mean that you need
more than one computer or need the ndfs. It is just a technology to
process large data sets." (Stefan Groschupf)
I know that the official new release of Nutch (0.8, I think) is not
released yet, but, I think, a new tutorial is needed on how to set up
Nutch to run properly, as the tutorial on the Nutch site,it seems, can
not cope with the new nigthly distributions and the new ones that will
be released.
Thanks
Re: New Tutorial Needed
Posted by Stefan Groschupf <sg...@media-style.com>.
Hi,
for clarification
The next release will be a 0.7.2 this will mostly fix a set of bugs
but also have some new feature, but does not use map reduce.
However the actually source stream (trunk) is 0.8 (map reduce )and
there is a branch 0.7 to be able fix bugs in 0.7.x and may there will
be more maintenance releases.
Anyway developers are focused on map reduce based based tool
development.
Stefan
Am 22.12.2005 um 21:12 schrieb carmmello:
> I have donwloaded the last Nutch nightly version, from 2005-12-18, and
> tried to run it, as usual (using the crawl method). As a result,
> I got
> the following error messages:
>
> "051222 175202 parsing file:/usr/nutch-nightly/conf/nutch-site.xml
> java.io.IOException: No input directories specified in: NutchConf:
> nutch-default.xml , mapred-
> default.xml , /tmp/nutch/mapred/local/localRunner/job_xom6lb.xml ,
> nutch-site.xml
> at org.apache.nutch.mapred.InputFormatBase.listFiles
> (InputFormatBase.java:85)
> at org.apache.nutch.mapred.InputFormatBase.getSplits
> (InputFormatBase.java:95)
> at org.apache.nutch.mapred.LocalJobRunner$Job.run
> (LocalJobRunner.java:63)
> 051222 175203 map 0%
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:
> 308)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:102)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> [root@localhost nutch-nightly]# "
>
> Also, reviewing some posts I came across the following statement:
>
> "The next version will be map reduce based in any case. So 0.7 is
> already the 'old' one and people will not continue to develop it (may
> just some maintenance releases).Map reduce doesn't mean that you need
> more than one computer or need the ndfs. It is just a technology to
> process large data sets." (Stefan Groschupf)
>
> I know that the official new release of Nutch (0.8, I think) is not
> released yet, but, I think, a new tutorial is needed on how to set up
> Nutch to run properly, as the tutorial on the Nutch site,it
> seems, can
> not cope with the new nigthly distributions and the new ones that will
> be released.
>
> Thanks
>
>
>
>
>
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net
Re: New Tutorial Needed
Posted by Doug Cutting <cu...@nutch.org>.
carmmello wrote:
> I have donwloaded the last Nutch nightly version, from 2005-12-18, and
> tried to run it, as usual (using the crawl method). As a result, I got
> the following error messages:
Where are you finding the tutorial? The version on the website is for
the currently released version, 0.7.1. There is a newer tutorial
bundled with 0.8. Please refer to the tutorial included with the code
you download.
Doug
Re: New Tutorial Needed
Posted by Raghavendra Prabhu <rr...@gmail.com>.
Hi guys
It is the same problem
In the new nutch ,the folder can contain many flat files having url names in
then
In the nutch-0-7.1 version,it was a single file
We can have a separate tutorial for each of the things
And the latest nutch also makes use of the new crawl
Whereas the nutch-0-7.1 use org.apache.nutch.tools.CrawlTool for crawl
Hope this helps
Raghavendra Prabhu
On 1/4/06, Lukas Vlcek <lu...@gmail.com> wrote:
>
> Hi,
>
> I found the solution to the original problem at the beginning of this
> mail thread. I am not sure if anybody is still interested in it,
> anyway, here it comes:
>
> The problem is very simple. The current nutch-trunk version requires
> initial url list to be stored in folder. In other words when usign
> crawl command (like the follwoing example "bin/nutch crawl urls -dir
> some_dir -depth n") then that urls MUST stands for folder and not for
> flat file.
>
> This is one of (minor) changes made to nutch when it matured from
> nutch-0.7.x to nutch-0.8. I didn't originally notice this. Again, this
> is simple issue but if anybody is still interested....
>
> Regards,
> Lukas
>
> On 12/23/05, Stefan Groschupf <sg...@media-style.com> wrote:
> > Hi
> >
> > > I have been struggling the same problem two days ago. I posted problem
> > > to nutch-dev maillist under the following sublect: "nutch-0.8-dev
> > > *mapred.input.subdir* problem
> >
> > As soon I found some time over the next days I will try to reproduce
> > the problem.
> > Meanwhile it would be good to know if you guys note that problem with
> > the nightly
> > build and if this also occurs when using a build form the latest
> > sources in subversion.
> >
> > Stefan
> >
>
Re: New Tutorial Needed
Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,
I found the solution to the original problem at the beginning of this
mail thread. I am not sure if anybody is still interested in it,
anyway, here it comes:
The problem is very simple. The current nutch-trunk version requires
initial url list to be stored in folder. In other words when usign
crawl command (like the follwoing example "bin/nutch crawl urls -dir
some_dir -depth n") then that urls MUST stands for folder and not for
flat file.
This is one of (minor) changes made to nutch when it matured from
nutch-0.7.x to nutch-0.8. I didn't originally notice this. Again, this
is simple issue but if anybody is still interested....
Regards,
Lukas
On 12/23/05, Stefan Groschupf <sg...@media-style.com> wrote:
> Hi
>
> > I have been struggling the same problem two days ago. I posted problem
> > to nutch-dev maillist under the following sublect: "nutch-0.8-dev
> > *mapred.input.subdir* problem
>
> As soon I found some time over the next days I will try to reproduce
> the problem.
> Meanwhile it would be good to know if you guys note that problem with
> the nightly
> build and if this also occurs when using a build form the latest
> sources in subversion.
>
> Stefan
>
Re: New Tutorial Needed
Posted by Stefan Groschupf <sg...@media-style.com>.
Hi
> I have been struggling the same problem two days ago. I posted problem
> to nutch-dev maillist under the following sublect: "nutch-0.8-dev
> *mapred.input.subdir* problem
As soon I found some time over the next days I will try to reproduce
the problem.
Meanwhile it would be good to know if you guys note that problem with
the nightly
build and if this also occurs when using a build form the latest
sources in subversion.
Stefan
Re: New Tutorial Needed
Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,
I have been struggling the same problem two days ago. I posted problem
to nutch-dev maillist under the following sublect: "nutch-0.8-dev
*mapred.input.subdir* problem ?".
Stefan and Paul responded by I am not sure this will solve my problem
(truly I didn have a time to fully test their suggestions).
To me it seems that the problem can be related to setting the
mapred.input.subdir property (I was looking into code). But I was not
able to find anything about mapred.input.subdir property on web.
Now I know that I am not alone who have this problem so either both I
and you are doing something wrong or there is a real problem in the
lates nutch trunk package.
Anybody else is facing this problem?
Lukas
On 12/22/05, carmmello <ca...@globo.com> wrote:
> I have donwloaded the last Nutch nightly version, from 2005-12-18, and
> tried to run it, as usual (using the crawl method). As a result, I got
> the following error messages:
>
> "051222 175202 parsing file:/usr/nutch-nightly/conf/nutch-site.xml
> java.io.IOException: No input directories specified in: NutchConf:
> nutch-default.xml , mapred-
> default.xml , /tmp/nutch/mapred/local/localRunner/job_xom6lb.xml ,
> nutch-site.xml
> at org.apache.nutch.mapred.InputFormatBase.listFiles
> (InputFormatBase.java:85)
> at org.apache.nutch.mapred.InputFormatBase.getSplits
> (InputFormatBase.java:95)
> at org.apache.nutch.mapred.LocalJobRunner$Job.run
> (LocalJobRunner.java:63)
> 051222 175203 map 0%
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:102)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> [root@localhost nutch-nightly]# "
>
> Also, reviewing some posts I came across the following statement:
>
> "The next version will be map reduce based in any case. So 0.7 is
> already the 'old' one and people will not continue to develop it (may
> just some maintenance releases).Map reduce doesn't mean that you need
> more than one computer or need the ndfs. It is just a technology to
> process large data sets." (Stefan Groschupf)
>
> I know that the official new release of Nutch (0.8, I think) is not
> released yet, but, I think, a new tutorial is needed on how to set up
> Nutch to run properly, as the tutorial on the Nutch site,it seems, can
> not cope with the new nigthly distributions and the new ones that will
> be released.
>
> Thanks
>
>
>
>
>
>
>