You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chip Calhoun <cc...@aip.org> on 2017/01/31 16:49:13 UTC

Need help installing scoring-depth plugin

I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my seeds, so my 1.4 command was:
bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000

My understanding is that the "crawl" command is deprecated, "-depth" went with it, and I need to install the scoring-depth plugin. I'm new to adding plugins. The instructions at https://wiki.apache.org/nutch/AboutPlugins give a sample command, but I don't know what the official PluginRepository for this plugin is and the sample link for the HtmlParser plugin is dead.

I'll appreciate any help. Thank you!

Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740
301-209-3180
https://www.aip.org/history-programs/niels-bohr-library


RE: Need help installing scoring-depth plugin

Posted by Chip Calhoun <cc...@aip.org>.
Thank you Julien! That's exactly what I needed.

Chip

-----Original Message-----
From: Julien Nioche [mailto:lists.digitalpebble@gmail.com] 
Sent: Tuesday, January 31, 2017 1:09 PM
To: user@nutch.apache.org
Subject: Re: Need help installing scoring-depth plugin

You don't need to install scoring-depth. It's just that the term 'depth' in the old crawl class has been replaced by 'rounds', which is more accurate.

The equivalent of the command you used to call should be *bin/crawl phfaws crawl **1 *

The value for topN needs setting in the crawl scrip, see sizeFetchlist in [ https://github.com/apache/nutch/blob/master/src/bin/crawl#L117]

HTH

Julien

On 31 January 2017 at 16:49, Chip Calhoun <cc...@aip.org> wrote:

> I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my 
> seeds, so my 1.4 command was:
> bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000
>
> My understanding is that the "crawl" command is deprecated, "-depth" 
> went with it, and I need to install the scoring-depth plugin. I'm new 
> to adding plugins. The instructions at 
> https://wiki.apache.org/nutch/AboutPlugins
> give a sample command, but I don't know what the official 
> PluginRepository for this plugin is and the sample link for the HtmlParser plugin is dead.
>
> I'll appreciate any help. Thank you!
>
> Chip Calhoun
> Digital Archivist
> Niels Bohr Library & Archives
> American Institute of Physics
> One Physics Ellipse
> College Park, MD  20740
> 301-209-3180
> https://www.aip.org/history-programs/niels-bohr-library
>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>

Re: Need help installing scoring-depth plugin

Posted by Julien Nioche <li...@gmail.com>.
You don't need to install scoring-depth. It's just that the term 'depth' in
the old crawl class has been replaced by 'rounds', which is more accurate.

The equivalent of the command you used to call should be
*bin/crawl phfaws crawl **1 *

The value for topN needs setting in the crawl scrip, see sizeFetchlist in [
https://github.com/apache/nutch/blob/master/src/bin/crawl#L117]

HTH

Julien

On 31 January 2017 at 16:49, Chip Calhoun <cc...@aip.org> wrote:

> I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my
> seeds, so my 1.4 command was:
> bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000
>
> My understanding is that the "crawl" command is deprecated, "-depth" went
> with it, and I need to install the scoring-depth plugin. I'm new to adding
> plugins. The instructions at https://wiki.apache.org/nutch/AboutPlugins
> give a sample command, but I don't know what the official PluginRepository
> for this plugin is and the sample link for the HtmlParser plugin is dead.
>
> I'll appreciate any help. Thank you!
>
> Chip Calhoun
> Digital Archivist
> Niels Bohr Library & Archives
> American Institute of Physics
> One Physics Ellipse
> College Park, MD  20740
> 301-209-3180
> https://www.aip.org/history-programs/niels-bohr-library
>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>