You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Manjula Wijewickrema <ma...@gmail.com> on 2014/06/24 10:53:17 UTC

Why bigram tf-idf is 0?

Hi,

In my programme, I tried to select the most relevant document based on
bigrams.

System gives me the following output.

{contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1,
fine manjula/1, librari manjula/1, librarian sabaragamuwa/1, main
librari/2, manjula assist/4, manjula fine/1, manjula name/1, name
manjula/1, sabaragamuwa univers/3, univers main/2, univers sabaragamuwa/1}

The frequencies of the bigrams are also correctly identified by the system.
But the tf-idf scores of these bigrams are given as 0. However, the same
programme gives the correct tf-idf values for unigrams.

Following is the code snippet that I wrote to determine the tf-idf of
bigrams.


********************************

for(int q1=1; q1<NB+1; q1++){ //NB-Number of Bigrams
  IndexReader indexReader = IndexReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
 Analyzer analyzer = new WhitespaceAnalyzer();
 QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
Query query = queryParser.parse(terms[pos[freqs.length-q1]]);
Hits hits = indexSearcher.search(query);
 Iterator<Hit> it = hits.iterator();
 TopDocs results=indexSearcher.search(query,10);
ScoreDoc[] hits1=results.scoreDocs;
for(ScoreDoc hit:hits1){
 Document doc=indexSearcher.doc(hit.doc);
 tfidf[q1-1]=hit.score;
 }
  }

***************************
Here, "hit.score" should give the tf-idf value of each bigram. Why it is
given as 0? If someone can please explain me how to resolve this problem.

Thanks,
Manjula.