You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tuğcem Oral <tu...@gmail.com> on 2014/11/11 14:17:58 UTC

Nutch 1.6 find original url or redirected ones

Hi all,

I wonder how could I find the original url after it hits a redirection.
They're actually found on seedlist but I can not guarantee which url is
redirected to which url.  In Fetcher phase I expect to read it from
Nutch.WRITABLE_REPR_URL_KEY, but it is overriden by redirected url.

Any suggestion how to read them from crawldb, segments or linkdb?

PS: I only crawl first-level pages (depth:1) on seedlist.

Best,
Tugcem.

-- 
TO