You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jeff Rodenburg <je...@gmail.com> on 2006/03/01 18:03:17 UTC

Re: Hacking proximity search: looking for feedback

Thanks to everyone on the replies.  I'm going to try several of these
approaches and with equivalent data sets and run some side-by-side tests.

No timeframes guarantees here, but I'll report back with the different
approaches and the test results.

cheers,
-- j


On 2/28/06, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : Very good points, I hadn't considered the term frequency of the digits
> : affecting scoring.  As an aside, can that aspect of the score be ignored
> for
> : these fields?
>
> The easiest way is to use a boost that is so low it's insignificant, or
> you could subclass TermQuery and override getSimilarity to return a
> DelegateSimilarity which wraps the real instance and returns constant
> values for things like tf() and idf() ... but i'm 95% sure that using a
> RangeFilter (or a ConstantScoreRangeQuery) is going to be faster then all
> of those TermQueries no matter what.
>
> : I need to spend more time with FunctionQuery, I haven't given it the
> : attention it deserves.
>
> i would start by trying out an apples to apples comparison of your current
> approach with one where your index only has one indexed field each for
> long/lat that uses ConstantScoreRangeQuery to do the boxing.  Compare both
> the size of the resulting indexes, the memory footprint while open, and
> the time spent executing comparable queries.  You should probably compare
> queries that involve both large boxes and small boxes, and depending on
> hte usage pattern you expect consider caching your Filters if you expect
> many boxes to be reused frequently.
>
> once you've found the "best" way to do your boxing ... then look into
> using FunctionQueries to influence your scores based on distance fro mthe
> center of hte box.
>
> :
> : Great feedback, thanks for the notes.
> :
> : -- jeff
> :
> : On 2/28/06, Chris Hostetter <ho...@fucit.org> wrote:
> : >
> : >
> : > : Geo definition:
> : > : Boxing around a center point.  It's not critical to do a radius
> search
> : > with
> : > : a given circle.  A boxed approach allows for taller or wider frames
> of
> : > : reference, which are applicable for our use.
> : >
> : > if you are just loking to confine your results to a box then i think
> : > RangeFiltering on both the X and Y axis will be more efficient then
> the
> : > individual term queries you are producing.
> : >
> : > It will have the added bonus of not artificially affecting the scores
> of
> : > hte documents based on how often a particular digit apears in a
> particular
> : > position of hte latitue accross your corpus.
> : >
> : > Once you've filtered down to a particular bounding box, you might
> consider
> : > going back to the function query approach to score documents inside
> that
> : > box based on their actual distance from the center point.  I don't
> recall
> : > at the moment but i believe FunctionQuery's Scorer supports skipTo in
> such
> : > a way that it won't bother computing the function for a document that
> has
> : > been skiped (ie: when containing in a BooleanQuery with another clause
> : > that has already prohibited it, or when executed in the context of a
> : > Filter)
> : >
> : >
> : >
> : > -Hoss
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> : >
> : >
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>