You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by parth_n <na...@asu.edu> on 2014/06/08 07:10:28 UTC

Lucene Spatial Question: How to retrieve all results within a bounding box?

Hi everyone,

I am trying to retrieve all results within a given bounding box in a 2-D
space. I understand that the scoring function is based on the distance from
the center of the query. I am not looking to retrieve top-k results, but all
of them. 

I have read previous forums on this similar question, and the solutions are
either out-dated (for previous versions) or inefficient (Option 1: input k
as INTEGER.MAX_VALUE, Option 2: use a TotalHitCountCollector and get the
total number of results using getTotalHits and then pass on this number to
the top-k search). 

I am looking for all the results in the bounding box, and do not care for
the order. I do not want to waste any computation, if possible, on any
sorting needed for top-k functionality.

Question: Is there any better solution out there that I can use instead of
the above mentioned solutions?

Any reply is much appreciated. Thanks!


Snippet of the code of the above mentioned Option 1:

SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
ctx.makeRectangle(minX, maxX, minY, maxY));

Filter filter = strategy.makeFilter(args);
TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), filter,
Integer.MAX_VALUE);

ScoreDoc[] scoreDocs = topDocs.scoreDocs; 
for (ScoreDoc s : scoreDocs) 
{ 
       Document doc = searcher.doc(s.doc); 
       System.out.println(doc.get("id") + "\t" + doc.get("name")); 
}




--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-retrieve-all-results-within-a-bounding-box-tp4140616.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Spatial Question: How to retrieve all results within a bounding box?

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
Yes; as I said in my last sentence: "You’ll see a difference of Document vs
StoredDocument with 4x”.

As to SimpleCollector not being in 4x (I didn’t check but I’ll take your
word for it) — the bottom line is that you need to write a Collector, and a
simple one at that.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 5:00 PM, parth_n <na...@asu.edu> wrote:

> Thanks a lot for the reply David!
>
> I am having some problems executing this code. I am using 4.8.1. I tried
> looking for StoredDocument and SimpleCollector in the source code but
> couldn't find them. Am I missing something?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-efficiently-retrieve-all-results-within-a-bounding-box-tp4140616p4140673.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene Spatial Question: How to retrieve all results within a bounding box?

Posted by parth_n <na...@asu.edu>.
Thanks a lot for the reply David!

I am having some problems executing this code. I am using 4.8.1. I tried
looking for StoredDocument and SimpleCollector in the source code but
couldn't find them. Am I missing something?



--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-efficiently-retrieve-all-results-within-a-bounding-box-tp4140616p4140673.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Spatial Question: How to retrieve all results within a bounding box?

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
Hi.

Your question is actually not particularly spatial; it’s more
circumstantial to your particular query. You want to know how to do a query
and collect *all* the results, in no particular order.  To do this
efficiently, you need to use a Collector.  Also, I noticed you are using
the “IsWithin” predicate.  If all of your data consists of points, then
“Intersects” is semantically equivalent and faster.  Here’s some sample
code I temporarily threw into SpatialExample.java that works on Lucene
trunk.  You’ll see a difference of Document vs StoredDocument with 4x:

    {
      SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
          ctx.makeRectangle(-90, -60, 30, 40));

      indexSearcher.search(strategy.makeQuery(args),
          new SimpleCollector() {
            public AtomicReader reader;

            @Override
            public boolean acceptsDocsOutOfOrder() {
              return true;
            }

            @Override
            protected void doSetNextReader(AtomicReaderContext context)
throws IOException {
              this.reader = context.reader();
            }

            @Override
            public void collect(int docId) throws IOException {
              StoredDocument doc = reader.document(docId);
              System.out.println(doc.get("id") + "\t" +
doc.get("myGeoField"));
            }
          });
    }



~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 1:10 AM, parth_n <na...@asu.edu> wrote:

> Hi everyone,
>
> I am trying to retrieve all results within a given bounding box in a 2-D
> space. I understand that the scoring function is based on the distance from
> the center of the query. I am not looking to retrieve top-k results, but
> all
> of them.
>
> I have read previous forums on this similar question, and the solutions are
> either out-dated (for previous versions) or inefficient (Option 1: input k
> as INTEGER.MAX_VALUE, Option 2: use a TotalHitCountCollector and get the
> total number of results using getTotalHits and then pass on this number to
> the top-k search).
>
> I am looking for all the results in the bounding box, and do not care for
> the order. I do not want to waste any computation, if possible, on any
> sorting needed for top-k functionality.
>
> Question: Is there any better solution out there that I can use instead of
> the above mentioned solutions?
>
> Any reply is much appreciated. Thanks!
>
>
> Snippet of the code of the above mentioned Option 1:
>
> SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
> ctx.makeRectangle(minX, maxX, minY, maxY));
>
> Filter filter = strategy.makeFilter(args);
> TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), filter,
> Integer.MAX_VALUE);
>
> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
> for (ScoreDoc s : scoreDocs)
> {
>        Document doc = searcher.doc(s.doc);
>        System.out.println(doc.get("id") + "\t" + doc.get("name"));
> }
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-retrieve-all-results-within-a-bounding-box-tp4140616.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>