You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Apurv Verma <da...@gmail.com> on 2012/03/24 13:51:58 UTC

Problems in Getting the tutorial running.

Hello,
 I was following the tutorial given on the wiki page.
http://wiki.apache.org/nutch/NutchTutorial
I am getting the following error when I do a fetch.

apurv@deepu:~/nutch-1.4/runtime/local$ ./bin/nutch fetch $s1
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting at 2012-03-24 18:16:19
Fetcher: segment: ls
Fetcher: org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist: file:/home/apurv/nutch-1.4/runtime/local/ls/crawl_generate
 at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
 at org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:105)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
 at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)

apurv@deepu:~/nutch-1.4/runtime/local$


I noticed that nowhere in the tutorial we are creating a directory local/ls
and so this error. Please help me solve it.


--
thanks and regards,

Apurv Verma
B. Tech.(CSE)
IIT- Ropar

Re: Problems in Getting the tutorial running.

Posted by meisyathedream <me...@gmail.com>.
I Have same problem before,, 
i try to edit nutch-site.xml
http.agent.name same with http.robots.agents
example :



now i have another error message



please help me to solve it..
and explain about meaning "ls -d crawldb/segments/2* | tail -1"



--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-in-Getting-the-tutorial-running-tp3853917p3876792.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems in Getting the tutorial running.

Posted by meisyathedream <me...@gmail.com>.
I Have same problem before,, 
i try to edit nutch-site.xml
http.agent.name same with http.robots.agents
example :



now i have another error message



please help me to solve it..
and explain about meaning "ls -d crawldb/segments/2* | tail -1"



--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-in-Getting-the-tutorial-running-tp3853917p3876807.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems in Getting the tutorial running.

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Apruv,

Please read through the tutorial again. It appears that you've missed more
than one step in the tutorial. I can assume you that if you follow it it
will work.

Thanks

Lewis

On Sat, Mar 24, 2012 at 12:51 PM, Apurv Verma <da...@gmail.com> wrote:

> Hello,
>  I was following the tutorial given on the wiki page.
> http://wiki.apache.org/nutch/NutchTutorial
> I am getting the following error when I do a fetch.
>
> apurv@deepu:~/nutch-1.4/runtime/local$ ./bin/nutch fetch $s1
> Fetcher: Your 'http.agent.name' value should be listed first in
> 'http.robots.agents' property.
> Fetcher: starting at 2012-03-24 18:16:19
> Fetcher: segment: ls
> Fetcher: org.apache.hadoop.mapred.InvalidInputException: Input path does
> not exist: file:/home/apurv/nutch-1.4/runtime/local/ls/crawl_generate
>  at
>
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
> at
>
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
>  at
> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:105)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>  at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
>  at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
>
> apurv@deepu:~/nutch-1.4/runtime/local$
>
>
> I noticed that nowhere in the tutorial we are creating a directory local/ls
> and so this error. Please help me solve it.
>
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>



-- 
*Lewis*

Re: Problems in Getting the tutorial running.

Posted by meisyathedream <me...@gmail.com>.
I Have same problem before,, 
i try to edit nutch-site.xml
http.agent.name same with http.robots.agents
example :



now i have another error message



please help me to solve it..
and explain about meaning "ls -d crawldb/segments/2* | tail -1"



--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-in-Getting-the-tutorial-running-tp3853917p3876802.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems in Getting the tutorial running.

Posted by meisyathedream <me...@gmail.com>.
I Have same problem before,, 
i try to edit nutch-site.xml
http.agent.name same with http.robots.agents
example :



now i have another error message



please help me to solve it..
and explain about meaning "ls -d crawldb/segments/2* | tail -1"



--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-in-Getting-the-tutorial-running-tp3853917p3876803.html
Sent from the Nutch - User mailing list archive at Nabble.com.