You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by nishant jani <ni...@gmail.com> on 2015/02/07 23:56:48 UTC

Isnt the All-in-one Crawl Deprecated?

Hello All,

I have been following the Nutch tutorial on
http://wiki.apache.org/nutch/NutchTutorial
which has the following command to be executed

bin/nutch bin/crawl urls -dir crawl -depth 3 -topN 5

which throws me Error: Could not find or load main class bin.crawl

A quick glance through the nutch script in the bin folder reveals that it
matches the second parameter passed to nutch (bin/crawl in this case) and
compares it various nutch options like inject, generate, etc. If no such
match is found, the script executes the parameter as a java class.

This would have worked prior to this git commit:
https://github.com/apache/nutch/commit/d3f2dd1bbad7c9d69a38ef9e6e756003a45da9e7


As a result, the script throws an error when bin/crawl is passed as a
second parameter. I think there is a discrepancy in the tutorials and the
code.

I may be completely off, but any one else facing the same issue?

Thank you.


ᐧ

Re: Isnt the All-in-one Crawl Deprecated?

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Nishant,

You are entirely correct. The all in one crawl script is now ./bin/crawl

If you have time to update the tutorials we would welcome it!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: nishant jani <ni...@gmail.com>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Saturday, February 7, 2015 at 2:56 PM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Subject: Isnt the All-in-one Crawl Deprecated?

>Hello All,
>
>
>I have been following the Nutch tutorial on
>http://wiki.apache.org/nutch/NutchTutorial
>which has the following command to be executed
>
>
>bin/nutch bin/crawl urls -dir crawl -depth 3 -topN 5
>
>
>which throws me Error: Could not find or load main class bin.crawl
>
>
>A quick glance through the nutch script in the bin folder reveals that it
>matches the second parameter passed to nutch (bin/crawl in this case) and
>compares it various nutch options like inject, generate, etc. If no such
>match is found, the script executes
> the parameter as a java class.
>
>
>This would have worked prior to this git commit:
>https://github.com/apache/nutch/commit/d3f2dd1bbad7c9d69a38ef9e6e756003a45
>da9e7 
><https://github.com/apache/nutch/commit/d3f2dd1bbad7c9d69a38ef9e6e756003a4
>5da9e7> 
>
>
>As a result, the script throws an error when bin/crawl is passed as a
>second parameter. I think there is a discrepancy in the tutorials and the
>code. 
>
>
>I may be completely off, but any one else facing the same issue?
>
>
>Thank you.  
>
>
>
>ᐧ
>