You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Twomey, David" <da...@novartis.com> on 2011/07/28 03:12:14 UTC

colocated term stats

Given a query term, is it possible to get from the index the top 10 collocated terms in the index.

ie:  return the top 10 terms that appear with this term based on doc count.

A plus would be to add some constraints on how near the terms are in the docs.




Re: colocated term stats

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Not sure if this will do what you want, but one way might be using facets.

Take the term you are interested in, and apply it as an fq.  Now the 
result set will include only documents that include that term.  So also 
request facets for that result set, the top 10 facets are the top 10 
terms that appear in that result set -- which is the top 10 terms that 
appear in documents together with your fq constraint. (Okay, you might 
need to look at 11, because one of the facet values will be the same 
term you fq constrained). You don't need to look at actual documents at 
all (&rows=0), just facet response.

Make sense? Does that do what you want?

On 7/27/2011 9:12 PM, Twomey, David wrote:
> Given a query term, is it possible to get from the index the top 10 collocated terms in the index.
>
> ie:  return the top 10 terms that appear with this term based on doc count.
>
> A plus would be to add some constraints on how near the terms are in the docs.
>
>
>
>