You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Manjula Wijewickrema <ma...@gmail.com> on 2014/06/24 10:53:17 UTC
Why bigram tf-idf is 0?
Hi,
In my programme, I tried to select the most relevant document based on
bigrams.
System gives me the following output.
{contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1,
fine manjula/1, librari manjula/1, librarian sabaragamuwa/1, main
librari/2, manjula assist/4, manjula fine/1, manjula name/1, name
manjula/1, sabaragamuwa univers/3, univers main/2, univers sabaragamuwa/1}
The frequencies of the bigrams are also correctly identified by the system.
But the tf-idf scores of these bigrams are given as 0. However, the same
programme gives the correct tf-idf values for unigrams.
Following is the code snippet that I wrote to determine the tf-idf of
bigrams.
********************************
for(int q1=1; q1<NB+1; q1++){ //NB-Number of Bigrams
IndexReader indexReader = IndexReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
Analyzer analyzer = new WhitespaceAnalyzer();
QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
Query query = queryParser.parse(terms[pos[freqs.length-q1]]);
Hits hits = indexSearcher.search(query);
Iterator<Hit> it = hits.iterator();
TopDocs results=indexSearcher.search(query,10);
ScoreDoc[] hits1=results.scoreDocs;
for(ScoreDoc hit:hits1){
Document doc=indexSearcher.doc(hit.doc);
tfidf[q1-1]=hit.score;
}
}
***************************
Here, "hit.score" should give the tf-idf value of each bigram. Why it is
given as 0? If someone can please explain me how to resolve this problem.
Thanks,
Manjula.