You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Awei <wo...@yahoo.com> on 2007/12/02 15:11:59 UTC
how to get sets of urls and terms for tf/idf
Dear All,
I am a beginner for nutch. I have three questions after using intranet
crawling:
1) How could I get all the urls after crawling?
2) How could I get all the terms after crawling and indexing?
3) How could I get the top N frequent terms given A URL (depends on
different fields)?
I need these three results to comput values of tf/idf.
For the first question, I managed to solve it after reading this forum. But
for the rest two, I am even in mess!!!!!!!!!!!!
Can anybody give me some help? Thanks a lot in advance.
--
View this message in context: http://www.nabble.com/how-to-get-sets-of-urls-and-terms-for-tf-idf-tf4931802.html#a14115859
Sent from the Nutch - User mailing list archive at Nabble.com.