You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Felix Zimmermann <fe...@gmx.de> on 2009/02/20 12:11:17 UTC

How to index content page of RSS-Feeds with pubDate metadata?

Hi,

 

I crawl RSS-Feeds with Nutch using depth=2. After that, I index the second
level of the crawl with solrindex in order to get the complete content of
the article, not only one or two lines of it.

 

Sure, when indexing the article itself, the pubDate-information of the
RSS-Feed is gone.

 

How can I get the metadata pubDate out of the first crawl/segment in order
to index it with the second crawl/segment?

 

Thanks for help

Felix.