You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christopher Ball <ch...@metaheuristica.com> on 2010/03/04 20:35:16 UTC

Count Sum of Term Occurrences?

How can I count the total number of a specific terms occurrences?

 

How can you get the total number of occurrences of a term across all
documents (e.g. Sum of the number of occurrences of a specific term in each
doc)? 

 

For example, I have 3 documents, document #1 has "The green bird is flying"
document #2 has "The green Car has a green driver", and document #3 has "I
just love the color green, oh green, such a nice green, I wish I were
green". 

 

I know the Terms component will give me the number of documents which have
the word green (in my example '3') but I want the sum occurrences (in  my
example '7').

 

C


Re: Count Sum of Term Occurrences?

Posted by Ahmet Arslan <io...@yahoo.com>.

> How can I count the total number of a
> specific terms occurrences?
> 
>  
> 
> How can you get the total number of occurrences of a term
> across all
> documents (e.g. Sum of the number of occurrences of a
> specific term in each
> doc)? 
> 
>  
> 
> For example, I have 3 documents, document #1 has "The green
> bird is flying"
> document #2 has "The green Car has a green driver", and
> document #3 has "I
> just love the color green, oh green, such a nice green, I
> wish I were
> green". 
> 
>  
> 
> I know the Terms component will give me the number of
> documents which have
> the word green (in my example '3') but I want the sum
> occurrences (inĀ  my
> example '7').

So you want collection frequency instead of document frequency. You can modify TermsComponent.java to do that by appending this code snippet after the line 'int docFreq = termEnum.docFreq();' :

TermDocs termDocs = rb.req.getSearcher().getReader().termDocs(theTerm);
int collectionFreq = 0;
while(termDocs.next())
collectionFreq += termDocs.freq();
docFreq = collectionFreq;