You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/04/08 00:47:54 UTC

Re: Some Kind of Crazy Histogram

(for people who don't know, the schema browser and the lUke 
handler return a "histogram" for each field)

: I have noticed that I can�t seem to make sense of the histogram.  For every
: field the x-axis shows powers of 2 which make no sense for things like brand
: name.  Am I looking at it wrong or is it having issues?

The histogram shows the distribution of term frequencies in an exponential 
scale.  the X axis is the upper bound of a term freq range (the ranges are 
from one power of 2 to the next) and the height of the bar is the number 
of terms whose frequency is in this range.

it's easiest to make sense of when looking at a field with a small 
number of distinct terms.

If you bring up the example schema (and optimize it to expunge deleteions) 
and then look at fields like "features" or "inStock" it starts to make 
more sense.

when i look at the "features" field the top three terms have a freq of 7, 
and the next three have a freq of 4, followed by some terms with a frequ 
of 4 ... so there are a total of six terms with a freq greater then 4 and 
less then or equal to 8, so 8 is the highest freq shows in the histogram, 
and it has a height of 6.

(Hmmm.... except the histogram seems to exlcude things with a freq of 1 
... i'm not sure if that's intentional or not. i'll open an issue.)


-Hoss