You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by thanh nguyen <ng...@yahoo.com.vn> on 2006/03/22 18:30:52 UTC
Repeat Second time: Extract important terms by programming??
Can anyone help me?
________________________________________________________
Bạn có sử dụng Yahoo! không?
Hãy xem thử trang chủ Yahoo! Việt Nam!
http://vn.yahoo.com
RE: Repeat Second time: Extract important terms by programming??
Posted by Edgar Meij <ed...@gmail.com>.
That's relatively easy, but not out-of-the box...
Something like:
private TreeMap<Double, String> getTFIDF(String index, int DocumentID, String Field ){
try{
IndexReader ir = IndexReader.open(index);
TermFreqVector tv = ir.getTermFreqVector(DocumentID, Field);
String[] Termstv=tv.getTerms();
Double Score;
TreeMap<Double, String> TfIdfs = new TreeMap<Double, String>();
int docFreq, N;
double[] TF = getTermFreqs(tv);
for (int i =0 ; i < tv.size(); i++){
docFreq = ir.docFreq(new Term(Field,Termstv[i]));
N = ir.numDocs() / docFreq;
Score= Double.valueOf(TF[i] * ( Math.log(N)/Math.log(2)));
TfIdfs.put(Score, Termstv[i]);
}
return TfIdfs;
Searching the mailinglist might help as well; http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/%3CA955EA1F8FE31749AEC8C998082F6C7C41A7AE@hai01.hippo.local%3E And see also: http://www.alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html
Edgar
> -----Oorspronkelijk bericht-----
> Van: thanh nguyen [mailto:ngay01032006@yahoo.com.vn]
> Verzonden: Wednesday, March 22, 2006 6:31 PM
> Aan: java-user@lucene.apache.org
> Onderwerp: Repeat Second time: Extract important terms by
> programming??
>
> Can anyone help me?
>
>
>
>
>
>
>
>
> ________________________________________________________
> Bạn có sử dụng Yahoo! không?
> Hãy xem thử trang chủ Yahoo! Việt Nam!
> http://vn.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org