You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Apostolis Xekoukoulotakis <xe...@gmail.com> on 2012/12/22 16:56:16 UTC

tries and spatial search

I just found out about the blocktree implementation and how it is used to
increase the speed of prefix search.

Have you tried to use it for spatial search?
I will explain to you how i think it will work. Please correct me if I am
wrong.

A radix tree is a structure that you can use to efficiently search terms
with a specific prefix.
Spatial searches are prefix searches. Lets see why.

Lets say we have the surface of the earth and we use as coordinate system
the Geographic coordinate system<
http://en.wikipedia.org/wiki/Geographic_coordinate_system>
 .

We then create a grid of 4 square surfaces and create an id for each one.
...
We split each square surface into 4 square surfaces with an id.

When we want to point to a location, its uid would be of this form:
idlv1idlv2idlv3...

Each id can be encoded with 4 bits.
The surface of the earth (510million kms) would then require 29 bits to
have the precision of 1m^2.

How do we encode arbitrary closed surfaces?
Here is what measure theory does: check page 7. theorem 1.4 figure
3<http://press.princeton.edu/chapters/s8008.pdf>
This theorem says that any surface can be described by a countable amout of
squares.
Lucene can only handle finite number of terms.
We thus  put an upper limit on the number of squares that represent our
surface. The ids of those squares are the terms that need to be searched.


In conclusion, searching documents inside an arbitrary surface means
creating the trie of that surface and then looking through the inverted
index trie for those terms.

The scoring that we will implement will be the surface area of the common
terms,(ie the surface area of the intersection of those surfaces)



*So 2 questions, *
*1)Will this work in general?*
*2) I want to avoid writing new code.*
*
*
* Is the blocktree implementation much different than a radix tree that it
wouldnt not make it work?*
*Does blocktree accept arbitrary byte or bit prefixes?*
*Which classes would I use if all the above answers are positive?*



-- 


Sincerely yours,

     Apostolis Xekoukoulotakis

Re: tries and spatial search

Posted by Apostolis Xekoukoulotakis <xe...@gmail.com>.
The new Spatial contrib module has already implemented what I was talking.




2012/12/22 Apostolis Xekoukoulotakis <xe...@gmail.com>

> I just found out about the blocktree implementation and how it is used to
> increase the speed of prefix search.
>
> Have you tried to use it for spatial search?
> I will explain to you how i think it will work. Please correct me if I am
> wrong.
>
> A radix tree is a structure that you can use to efficiently search terms
> with a specific prefix.
> Spatial searches are prefix searches. Lets see why.
>
> Lets say we have the surface of the earth and we use as coordinate system
> the Geographic coordinate system<http://en.wikipedia.org/wiki/Geographic_coordinate_system>
>  .
>
> We then create a grid of 4 square surfaces and create an id for each one.
> ...
> We split each square surface into 4 square surfaces with an id.
>
> When we want to point to a location, its uid would be of this form:
> idlv1idlv2idlv3...
>
> Each id can be encoded with 4 bits.
> The surface of the earth (510million kms) would then require 29 bits to
> have the precision of 1m^2.
>
> How do we encode arbitrary closed surfaces?
> Here is what measure theory does: check page 7. theorem 1.4 figure 3<http://press.princeton.edu/chapters/s8008.pdf>
> This theorem says that any surface can be described by a countable amout
> of squares.
> Lucene can only handle finite number of terms.
> We thus  put an upper limit on the number of squares that represent our
> surface. The ids of those squares are the terms that need to be searched.
>
>
> In conclusion, searching documents inside an arbitrary surface means
> creating the trie of that surface and then looking through the inverted
> index trie for those terms.
>
> The scoring that we will implement will be the surface area of the common
> terms,(ie the surface area of the intersection of those surfaces)
>
>
>
> *So 2 questions, *
> *1)Will this work in general?*
> *2) I want to avoid writing new code.*
> *
> *
> * Is the blocktree implementation much different than a radix tree that
> it wouldnt not make it work?*
> *Does blocktree accept arbitrary byte or bit prefixes?*
> *Which classes would I use if all the above answers are positive?*
>
>
>
> --
>
>
> Sincerely yours,
>
>      Apostolis Xekoukoulotakis
>
>


-- 


Sincerely yours,

     Apostolis Xekoukoulotakis