You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Zhang, Lisheng" <Li...@broadvision.com> on 2008/02/26 18:32:17 UTC

How to find most popular terms quickly?

Hi,

I have a very large amount of documents indexed, one field is Brand
(untokenized), now I need to find the most popular brand (which brand
is used by most Docs), one way is:

1) open IndexReader.
2) call terms() to get all terms, then filter out terms in field Brand.
3) call termDocs(Term) to get Docs having each special Brand.
4) count which term is used by most docs from above result.

Is this the most efficient way?

Thanks very much for helps, Lisheng

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to find most popular terms quickly?

Posted by Chris Hostetter <ho...@fucit.org>.
: 1) open IndexReader.
: 2) call terms() to get all terms, then filter out terms in field Brand.
: 3) call termDocs(Term) to get Docs having each special Brand.
: 4) count which term is used by most docs from above result.
: 
: Is this the most efficient way?

pretty much ... take a look at the HighFreqTerms class in the 
"miscellaneous" contrib.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org