You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/05/11 17:50:00 UTC

[jira] [Commented] (LUCENE-6477) Add BKD tree for spatial shape query intersecting indexed points

    [ https://issues.apache.org/jira/browse/LUCENE-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538079#comment-14538079 ] 

Michael McCandless commented on LUCENE-6477:
--------------------------------------------

I tested performance with the same "bounding boxes around London, UK"
on 60.8M points test from LUCENE-6450:

{noformat}
  Index size disk: 1.2 GB
  Index size heap: 0.75 MB
  Index time: 636.0 seconds (incl. forceMerge)
  Mean query time: .0068 sec
{noformat}
 
Indexing time is a bit slower (makes heavy use of OfflineSorter, but
should be O(N*log(N) overall).

It's very fast at search time: ~5.7X faster than GeoHashPrefixTree.

It also uses very little heap (0.75 MB) at search time for the inner
nodes of the tree.

It currently allocates FixedBitSet(maxDoc) per query & segment at
search time (like spatial does with RecursivePrefixTreeStrategy, I
think?).  Really it should use BitDocIdSet.Builder, to start sparse
and upgrade to FixedBitSet only if result set will be "big-ish", but
when I do that the query is 2.4X slower, which is frustrating.  I think
we could try using SentinelIntSet for the sparse case, like spatial
does in some cases.


> Add BKD tree for spatial shape query intersecting indexed points
> ----------------------------------------------------------------
>
>                 Key: LUCENE-6477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6477
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk, 5.2
>
>         Attachments: LUCENE-6477.patch
>
>
> I'd like to explore using dedicated spatial trees for faster shape
> intersection filters than postings-based implementations.
> I implemented the tree data structure from
> https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
> The idea is simple: it builds a full binary tree, partitioning 2D
> space, alternately on lat and then lon, into smaller and smaller
> rectangles until a leaf has <= N (default 1024) points.
> It cannot index shapes (just points), and can then do fast shape
> intersection queries.  Multi-valued fields are supported.
> I only implemented the "point is contained in this bounding box" query
> for now, but I think polygon shape querying should be easy to
> implement using the same approach from LUCENE-6450.
> For indexing, you add BKDPointField (takes lat, lon) to your doc, and
> must set up your Codec use BKDTreeDocValuesFormat for that field.
> This DV format wraps Lucene50DVFormat, but then builds the disk-based
> BKD tree structure on the side.  BKDPointInBBoxQuery then requires this
> DVFormat, and casts it to gain access to the tree.
> I quantize each incoming double lat/lon to 32 bits precision (so 64
> bits per point) = ~9 milli-meter lon precision at the equator, I
> think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org