You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chee wu <ch...@gmail.com> on 2007/01/10 15:13:51 UTC
How to retrieve and store the date infromation of a page
Hi,
The HTML pages from BBS or blogs often contain the information like post date and modified date. I want to retrieve this kind of information and provide the ability to filter the search result according date. To implement this requirement, I am not sure with the two points below:
1. When to retrieve information
During parsing process or indexing process ?
2. I hope the date information should be kept in the memory,so where to keep the date information ?
In index ? - Seems not possible,because indexes were organized by tokens, not by documents
Any suggestion for this problem is welcome!
Thanks