You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Imtiaz Shakil Siddique <sh...@gmail.com> on 2015/09/09 23:09:45 UTC

Document scores(boost)

Hello,
I've been using nutch 1.9/1.10 for about six months. One thing I noticed
that at each iteration(during parsing phase) nutch calculates document
boost(using Opic algorithm)

1. My question is how this score is adjusted with respect to all the
segments.

2. Another question is inside bin/crawl script what does the webgraph,
linkrank, scoreupdater,nodedumper do? Can anyone be kind enough to explain?

Thank you so much.
Imtiaz Shakil Siddique

Re: Document scores(boost)

Posted by Imtiaz Shakil Siddique <sh...@gmail.com>.

Hello Markus Jelsma,

So you are suggesting that I should
1. remove "scoring-opic" plugin
2. run the webgraph > linkrank > scoreupdater from /bin/crawl script
if I want to calculate document boost with all segments in hand.


It'd be very helpful if you could explain what these four things do ( webgraph,
linkrank, scoreupdater,nodedumper )

Thank you so much for the help.
Imtiaz Shakil Siddique


On 10 September 2015 at 19:27, Markus Jelsma <ma...@openindex.io>
wrote:

> Hello - OPIC is useless in incremental crawls. You can either disable
> scoring altogether, or use webgraph > linkrank > scoreupdater.
> Markus
>
> -----Original message-----
> > From:Imtiaz Shakil Siddique <sh...@gmail.com>
> > Sent: Wednesday 9th September 2015 23:09
> > To: user@nutch.apache.org
> > Subject: Document scores(boost)
> >
> > Hello,
> > I've been using nutch 1.9/1.10 for about six months. One thing I noticed
> > that at each iteration(during parsing phase) nutch calculates document
> > boost(using Opic algorithm)
> >
> > 1. My question is how this score is adjusted with respect to all the
> > segments.
> >
> > 2. Another question is inside bin/crawl script what does the webgraph,
> > linkrank, scoreupdater,nodedumper do? Can anyone be kind enough to
> explain?
> >
> > Thank you so much.
> > Imtiaz Shakil Siddique
> >
>

RE: Document scores(boost)

Posted by Markus Jelsma <ma...@openindex.io>.

Hello - OPIC is useless in incremental crawls. You can either disable scoring altogether, or use webgraph > linkrank > scoreupdater.
Markus
 
-----Original message-----
> From:Imtiaz Shakil Siddique <sh...@gmail.com>
> Sent: Wednesday 9th September 2015 23:09
> To: user@nutch.apache.org
> Subject: Document scores(boost)
> 
> Hello,
> I've been using nutch 1.9/1.10 for about six months. One thing I noticed
> that at each iteration(during parsing phase) nutch calculates document
> boost(using Opic algorithm)
> 
> 1. My question is how this score is adjusted with respect to all the
> segments.
> 
> 2. Another question is inside bin/crawl script what does the webgraph,
> linkrank, scoreupdater,nodedumper do? Can anyone be kind enough to explain?
> 
> Thank you so much.
> Imtiaz Shakil Siddique
>