You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sascha Fahl <sa...@googlemail.com> on 2008/07/19 11:53:33 UTC
GeoSort approach - your opinion
Hi,
last week I realized an approach for GeoSort in lucene. Inspired by
"Lucene in action" I modified the algorithm in the following way. When
an IndexReader for a certain index is created, a cache for
geoinformation is created - this simply is a 2 dimensional int Array.
So it is possible to cache geoinformation for 1.000.000 docs in around
8 MB. Everytime the ScoreDocComparator.compare(ScoreDoc i, ScoreDoc j)
method is called I fetch the int Array with the geoinfo from the cache
and calculate the distance.
I think this is a quite good solution:
1. Only the distances of real Hits are calculated. So only needed
operations are done.
2. The geoinformation is not fetched via IndexReader.doc(i) but
directly from the cache that is placed in the RAM
3. All hits get returned because this approach does not work with a
boxed model, that excludes documents that are not within a certain
radius (this is very annoying if there is a hit with a distance of 51
km and the radius is 50 km)
What do you think about this approach? The only possible advantage is
the cache I think because I do not really know if the JVM is good in
handling 10 MB of data in the RAM.
MfG
Sascha Fahl
sascha.fahl@gmail.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: GeoSort approach - your opinion
Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Sat, 2008-07-19 at 11:53 +0200, Sascha Fahl wrote:
> last week I realized an approach for GeoSort in lucene. Inspired by
> "Lucene in action" I modified the algorithm in the following way. When
> an IndexReader for a certain index is created, a cache for
> geoinformation is created - this simply is a 2 dimensional int Array.
> So it is possible to cache geoinformation for 1.000.000 docs in around
> 8 MB.
Be aware that arrays in themselves take up a fair amount of memory, so
you'll want to use only 3 arrays in total and not 1000001:
int[][] coordinates = new int[2];
coordinates[0] = new int[1000000];
coordinates[1] = new int[1000000];
[...]
> What do you think about this approach?
Sounds fine when the index rarely changes.
> The only possible advantage is the cache I think because I do not really
> know if the JVM is good in handling 10 MB of data in the RAM.
The Sun JVM is perfectly capable of handling large arrays efficiently.
We use an array-based structure of ints and longs for quick facet look
up that is approximately 300MB.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org