You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Salman Rasheed <sa...@hotmail.com> on 2009/02/10 16:08:23 UTC

URL Normalizer - Linkdb

I'm trying to normalize urls  where I'm crawling an internal domain and trying to transform urls for the LinkDB scope. By which I'm hoping to convert the internal domain names to external domains which can be presented as search results. For this I have a custom plug-in which replaces URL hostnames based on a map. This plug-in is registered fine but for some reason is not being invoked for LInkDB scope. It invokes fine for other scopes - outlink, crawldb etc.
What am I missing? Also is this the best way to transform search result URLs?Thanks,
Salman
_________________________________________________________________
Windows Liveā„¢: E-mail. Chat. Share. Get more ways to connect. 
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_t2_allup_explore_022009

Re: URL Normalizer - Linkdb

Posted by KSY <ks...@yahoo.com>.
Do not directly manipulate LinkDB alone - it will mess up with the whole
crawling process.   The LinkDB keeps track of a graph of all the links from
the URL's.

Use IndexingFilter instead -- as suggested in the following post:

http://www.nabble.com/URL-Transformation-td21982403.html
http://www.nabble.com/URL-Transformation-td21982403.html 

>-( :-U :sleep: :rules:
-- 
View this message in context: http://www.nabble.com/URL-Normalizer---Linkdb-tp21935641p22461426.html
Sent from the Nutch - User mailing list archive at Nabble.com.