You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nicholas Knize (JIRA)" <ji...@apache.org> on 2015/07/21 06:23:05 UTC

[jira] [Comment Edited] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards

    [ https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634504#comment-14634504 ] 

Nicholas Knize edited comment on LUCENE-6685 at 7/21/15 4:22 AM:
-----------------------------------------------------------------

I put together a visualization of the ranges that were being created (will add the link to the video when I post it). This revealed some interesting issues. At precision_step 6 and detailLevel 16 the number of ranges for the worst case boundary condition were nearly 2 million. 100 iteration beast tests would take just over an hour.  Reducing that precisionStep to 3 and the detailLevel to 12 reduced the number of ranges to just over 10K.  The 100 iteration beast test was reduced from over an hour to just over 8 minutes. There was also a bug in the pointDistance query that added unnecessary high resolution ranges that fell within the bounding box but outside the actual pointRadius.  Patch included


was (Author: nknize):
I put together a visualization of the ranges that were being created (will add the link to the video when I post it). This revealed some interesting issues. At precision_step 6 and detailLevel 16 the number of ranges for the worst case boundary condition were nearly 2 million. 100 iteration beast tests would take just over an hour.  Reducing that precisionStep to 3 and the detailLevel to 12 reduced the number of ranges to just over 10K.  The 100 iteration beast test was reduced from over an hour to just over 8 minutes. There was also a bug in the pointDistance query that added unnecessary high resolution ranges that fell within the bounding box but outside the actual pointRadius.

> GeoPointInBBox/Distance queries should have safeguards
> ------------------------------------------------------
>
>                 Key: LUCENE-6685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6685
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 5.3, Trunk
>
>         Attachments: LUCENE-6685.patch
>
>
> These queries build a big list of term ranges, where the size of the list is in proportion to how many cells of the space filling curve are "crossed" by the perimeter of the query (I think?).
> This can easily be 100s of MBs for a big enough query ... not to mention slow to enumerate (we still do this again for each segment).
> I think the queries should have safeguards, much like we have maxDeterminizedStates for Automaton based queries, to prevent accidental OOMEs.
> But I think longer term we should either change the ranges to be enumerated on-demand and never stored in entirety (like NumericRangeTermsEnum), or change the query so it has a fixed budget of how many cells it's allowed to visit and then within a crossing cell it uses doc values to post-filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org