You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yulio Aleman Jimenez <yu...@uci.cu> on 2016/04/29 22:47:32 UTC
Priorize links in Fetching Step
Hi.
I'm using Nutch 1.9 with Solr 4.10 in a local environment.
I need a way to priorize some links in the Fetching Steps, through filtering the new links identified in the last crawls by some criterias, for example the extension of the resource. The goal is priorize images, documents, etc, before HTML pages in crawling process.
Is there any property in nutch-site.xml or any plugin capable to do this?? How can I do this???
I accept any sugestion, or some source code snippets for creating a new plugin for nutch.
Best regards
--
Ing. Yulio Aleman Jimenez
Dpto. Soluciones Informáticas para Internet. CIDI
Universidad de las Ciencias Informáticas (UCI)
-----------------------------------------------------------------------------------------------------------------------------------
"Podrán morir los hombres, PERO JAMÁS SUS IDEAS"
La UCI presente este 1ro. de Mayo en la Plaza de la Revolución
junto a todo el pueblo.¡Por Cuba: Unidad y Compromiso!