You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by 小鱼儿 <ct...@gmail.com> on 2020/01/10 04:39:58 UTC

Quest about Lucene's IndexSearcher.search(Query query, int n) API's parameter n

I'm doing a POI(Point-of-interest) search using lucene, each POI has a
"location" which is a GeoPoint/LonLat type. I need do a keyword-range
search but the query result POIs need to sort by distance to a starting
point.

This "distance", in fact, is a dynamic computed property which cannot be
used by the SortField API, i doubt if Lucene can support a
"DynamicSortField", that would be perfect. Or i had to do:
use IndexSearcher.search(Query query, int n) API to first filter out Top-n
POIs and then do a manual sort after these n documents' StoredField's have
all be loaded, which seems not efficient.

The problem is, the parameter n in IndexSearcher.search API has a usability
problem, it may be not large enough to cover all the candidates. & the
low-level search(Query, Collector) API seems to be short of documentations.
If set the n to a very large value, the later sort proc may be very
inefficient...

My current idea: use more detailed near-to-far sub geo ranges to
iteratively/incrementally search/filter -> load documents -> manual sort ->
combine.

Any suggestions?

Re: Quest about Lucene's IndexSearcher.search(Query query, int n) API's parameter n

Posted by Uwe Schindler <uw...@thetaphi.de>.
You can sort with custom formulas. All values that are needed for calculation must be part of the index as docvalues fields. You can then use expressions module to supply a formula for the calculation, which may include the original score. The expressions module can override the score (so standard sorting works) or provide a SortField.

https://lucene.apache.org/core/8_4_0/expressions/org/apache/lucene/expressions/Expression.html

It is only a bad idea to do this if the calculation is expensive, as it needs to be done for every possible hit. One optimization is therefore to do a simple calculation using expressions, which brings all documents into a average order, so only manually sorting top-n is ok.

Uwe

Am January 10, 2020 4:39:58 AM UTC schrieb "小鱼儿" <ct...@gmail.com>:
>I'm doing a POI(Point-of-interest) search using lucene, each POI has a
>"location" which is a GeoPoint/LonLat type. I need do a keyword-range
>search but the query result POIs need to sort by distance to a starting
>point.
>
>This "distance", in fact, is a dynamic computed property which cannot
>be
>used by the SortField API, i doubt if Lucene can support a
>"DynamicSortField", that would be perfect. Or i had to do:
>use IndexSearcher.search(Query query, int n) API to first filter out
>Top-n
>POIs and then do a manual sort after these n documents' StoredField's
>have
>all be loaded, which seems not efficient.
>
>The problem is, the parameter n in IndexSearcher.search API has a
>usability
>problem, it may be not large enough to cover all the candidates. & the
>low-level search(Query, Collector) API seems to be short of
>documentations.
>If set the n to a very large value, the later sort proc may be very
>inefficient...
>
>My current idea: use more detailed near-to-far sub geo ranges to
>iteratively/incrementally search/filter -> load documents -> manual
>sort ->
>combine.
>
>Any suggestions?

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de