You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/07/19 01:39:20 UTC
Re: solr scale on trie fields

There are a few things here that i think you might be missunderstanding...

: function, but i read in solr book (Solr 1.4 enterprise search server by Eric
: Pugh and David Smiley) that "*scale will traverse the entire document set
: and evaluate the function to determine the smallest and largest values for
: each query invocation, and it is not cached " *. What makes me ask two
: questions:
: 
:    1. Is this also true for TrieFields (such as solr.TrieIntField), because
:    as far as I understand it suppose to have the values sorted in some manner,
:    so checking for the min and max val should happen in constant time
:    complexity.

Trie fields are encoded such that the "min" numeric value gets the "min"
Term value, and the "max" numeric value gets the "max" Term value, but  
they are still just Terms, so finding the "max" Term value does require a  
scan of the TermEnumerator.

but that's not what we're talking about with the "scale" function.  

scale(...) is generic -- it can be used the scale the output of *any* 
function, not just field values, so it can't use generic Term seeking 
code, because a client could specify "scale(map(myTrieField,0,0,5),1,10)" 
just as easily as they could write "scale(myTrieField,1,10)"

:    2. why are the results are not cached?!?! is there any way to defined
:    them to be cached?

In the general case, it's not clear  how/when/where this information could 
be cached -- in your use case it may seem straight forward: you are 
scaling the values of asingle field, so you think the min/max value for 
that ield should be cached, but as i mentioned functions in solr are 
entirely general purpose.  caching the min/max values for every arbitrary 
function that might ever be used as the input to the scale function isn't 
really a good idea.

That said: there would likely be some definite value in adding new 
"minterm" and "maxterm" functions that would take as argument explicit 
field names (not general functions) which would likely be ableto more 
efficiently compute those values (and then be more efficient when scaling) 
but as mentioned there is still the isue of finding the "max" term value 
requiring iteration.

some work is being done at a lower level to better encode these kinds of 
field/term stats in the index, and i suspect you'll see people more eager 
to add functions like that when that underlying work is done.



-Hoss