You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chee wu <ch...@gmail.com> on 2007/01/10 15:13:51 UTC

How to retrieve and store the date infromation of a page

Hi,
The HTML pages from BBS or blogs often contain the  information like  post date and modified date. I want to retrieve this kind of information and  provide the ability to filter the search result according date. To implement this requirement, I am not sure with the two points below: 
1. When to retrieve information 
   During parsing process or indexing process ?

2. I hope the date information should be kept in the memory,so where to keep the date information ?
   In index ? - Seems not possible,because indexes were organized by tokens, not by documents

Any suggestion for this problem is welcome!

Thanks