You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by thanh nguyen <ng...@yahoo.com.vn> on 2006/03/22 18:30:52 UTC

Repeat Second time: Extract important terms by programming??

Can anyone help me?



	


	
		
________________________________________________________ 
Bạn có sử dụng Yahoo! không? 
Hãy xem thử trang chủ Yahoo! Việt Nam! 
http://vn.yahoo.com

RE: Repeat Second time: Extract important terms by programming??

Posted by Edgar Meij <ed...@gmail.com>.
That's relatively easy, but not out-of-the box... 

Something like:

 private TreeMap<Double, String> getTFIDF(String index, int DocumentID, String Field ){
      try{
     IndexReader ir = IndexReader.open(index); 
    TermFreqVector tv = ir.getTermFreqVector(DocumentID, Field);
    String[] Termstv=tv.getTerms();
    Double Score;
    TreeMap<Double, String> TfIdfs = new TreeMap<Double, String>();
    int docFreq, N;
    double[] TF = getTermFreqs(tv);
    for (int i =0 ; i < tv.size(); i++){
         docFreq = ir.docFreq(new Term(Field,Termstv[i]));
           N = ir.numDocs() / docFreq;
          Score= Double.valueOf(TF[i] *  ( Math.log(N)/Math.log(2)));
          TfIdfs.put(Score, Termstv[i]);      
    }
    return TfIdfs;

Searching the mailinglist might help as well; http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/%3CA955EA1F8FE31749AEC8C998082F6C7C41A7AE@hai01.hippo.local%3E And see also: http://www.alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html 


Edgar

> -----Oorspronkelijk bericht-----
> Van: thanh nguyen [mailto:ngay01032006@yahoo.com.vn] 
> Verzonden: Wednesday, March 22, 2006 6:31 PM
> Aan: java-user@lucene.apache.org
> Onderwerp: Repeat Second time: Extract important terms by 
> programming??
> 
> Can anyone help me?
> 
> 
> 
> 	
> 
> 
> 	
> 		
> ________________________________________________________
> Bạn có sử dụng Yahoo! không? 
> Hãy xem thử trang chủ Yahoo! Việt Nam! 
> http://vn.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org