You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by uyilmaz <uy...@vivaldi.net.INVALID> on 2020/11/03 18:04:22 UTC

Solr tag cloud - words and counts

I have been trying to find a way to do this in Solr for a while. Perform a query, and for a text_general field in the result set, find each term's # of occurences.

- I tried the Terms Component, it doesn't have the ability to restrict the result set with a query.

- Tried faceting on the field, since it's a text_general field it doesn't have docValues, plus cardinality is very high (millions of documents * tens of words in each field), so it works but it's very slow and sometimes times out.

- Tried significantTerms streaming expression, but it's logically not the same with what I'm looking for. It gives the words occuring frequently in the result set, but not occuring as frequently outside it. So it's better to find out frequency anomalies rather than simply the counts.

Do you have any suggestions?

Regards

-- 
uyilmaz <uy...@vivaldi.net>

Re: Solr tag cloud - words and counts

Posted by Walter Underwood <wu...@wunderwood.org>.
For a tag cloud, the anomalous words are what you want. If you choose the most common words, then every tag cloud will have the same words. It will look like:

the, be, to, it, of, and, a, in, that, have, I, it, for, not, on, with, ...

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 3, 2020, at 10:04 AM, uyilmaz <uy...@vivaldi.net.INVALID> wrote:
> 
> 
> I have been trying to find a way to do this in Solr for a while. Perform a query, and for a text_general field in the result set, find each term's # of occurences.
> 
> - I tried the Terms Component, it doesn't have the ability to restrict the result set with a query.
> 
> - Tried faceting on the field, since it's a text_general field it doesn't have docValues, plus cardinality is very high (millions of documents * tens of words in each field), so it works but it's very slow and sometimes times out.
> 
> - Tried significantTerms streaming expression, but it's logically not the same with what I'm looking for. It gives the words occuring frequently in the result set, but not occuring as frequently outside it. So it's better to find out frequency anomalies rather than simply the counts.
> 
> Do you have any suggestions?
> 
> Regards
> 
> -- 
> uyilmaz <uy...@vivaldi.net>