You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Newburn <jn...@zappos.com> on 2009/03/30 17:53:39 UTC
Some Kind of Crazy Histogram
I have noticed that I can¹t seem to make sense of the histogram. For every
field the x-axis shows powers of 2 which make no sense for things like brand
name. Am I looking at it wrong or is it having issues?
--
Jeff Newburn
Software Engineer, Zappos.com
jnewburn@zappos.com - 702-943-7562
Re: Some Kind of Crazy Histogram
Posted by Chris Hostetter <ho...@fucit.org>.
(for people who don't know, the schema browser and the lUke
handler return a "histogram" for each field)
: I have noticed that I can�t seem to make sense of the histogram. For every
: field the x-axis shows powers of 2 which make no sense for things like brand
: name. Am I looking at it wrong or is it having issues?
The histogram shows the distribution of term frequencies in an exponential
scale. the X axis is the upper bound of a term freq range (the ranges are
from one power of 2 to the next) and the height of the bar is the number
of terms whose frequency is in this range.
it's easiest to make sense of when looking at a field with a small
number of distinct terms.
If you bring up the example schema (and optimize it to expunge deleteions)
and then look at fields like "features" or "inStock" it starts to make
more sense.
when i look at the "features" field the top three terms have a freq of 7,
and the next three have a freq of 4, followed by some terms with a frequ
of 4 ... so there are a total of six terms with a freq greater then 4 and
less then or equal to 8, so 8 is the highest freq shows in the histogram,
and it has a height of 6.
(Hmmm.... except the histogram seems to exlcude things with a freq of 1
... i'm not sure if that's intentional or not. i'll open an issue.)
-Hoss