You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Srinivasan Ramaswamy <ur...@gmail.com> on 2018/06/07 19:49:13 UTC

some urls have score of Infinity while others have very low score

Hi All

*Background*
I would like to crawl 10-20 domains and all the pages underneath. I have a
Nutch crawler that's running continuously.

*Problem*
I am trying to investigate why some urls are still not in index yet, though
they were created/updated 1 month back. During the investigation, I found
out that many urls got a score of "Infinity". I am using "scoring-opic" in
my nutch-default configuration. The url in question has a very low
score(1.5514997E-4). I am afraid that the missing url never gets picked for
fetching.

*Questions*:
1. Is opic scoring the best scoring to use for my use case (10-20 domains)?
If not can you recommend some other solution that worked for you.
2. Is the score "Infinity" a bug or a feature to tell that these are very
important pages. When i look at those urls they dont look as important to
me. I dont understand how they got that high a score.

Thanks
Srini