You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Awei <wo...@yahoo.com> on 2007/12/02 15:11:59 UTC

how to get sets of urls and terms for tf/idf

Dear All, 

I am a beginner for nutch. I have three questions after using intranet
crawling: 

1) How could I get all the urls after crawling? 

2) How could I get all the terms after crawling and indexing? 

3) How could I get the top N frequent terms given A URL (depends on
different fields)? 

I need these three results to comput values of tf/idf. 

For the first question, I managed to solve it after reading this forum. But
for the rest two, I am even in mess!!!!!!!!!!!! 

Can anybody give me some help? Thanks a lot in advance. 

-- 
View this message in context: http://www.nabble.com/how-to-get-sets-of-urls-and-terms-for-tf-idf-tf4931802.html#a14115859
Sent from the Nutch - User mailing list archive at Nabble.com.