You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2014/02/04 04:28:35 UTC

Re: how to write an efficient query with a subquery to restrict the search space?

Hi,

Sounds like a possible document and query routing use case.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 31, 2014 7:11 AM, "svante karlsson" <sa...@csi.se> wrote:

> It seems to be faster to first restrict the search space and then do the
> scoring compared to just use the full query and let solr handle everything.
>
> For example in my application one of the scoring fields effectivly hits
> 1/12 of the database (a month field) and if we have 100'' items in the
> database the this matters.
>
> /svante
>
>
> 2014-01-30 Jack Krupansky <ja...@basetechnology.com>:
>
> > Lucene's default scoring should give you much of what you want - ranking
> > hits of low-frequency terms higher - without any special query syntax -
> > just list out your terms and use "OR" as your default operator.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: svante karlsson
> > Sent: Thursday, January 23, 2014 6:42 AM
> > To: solr-user@lucene.apache.org
> > Subject: how to write an efficient query with a subquery to restrict the
> > search space?
> >
> >
> > I have a solr db containing 1 billion records that I'm trying to use in a
> > NoSQL fashion.
> >
> > What I want to do is find the best matches using all search terms but
> > restrict the search space to the most unique terms
> >
> > In this example I know that val2 and val4 is rare terms and val1 and val3
> > are more common. In my real scenario I'll have 20 fields that I want to
> > include or exclude in the inner query depending on the uniqueness of the
> > requested value.
> >
> >
> > my first approach was:
> > q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND
> (field2:val2
> > OR field4:val4)&rows=100&fl=*
> >
> > but what I think I get is
> > .....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> > OR'ed with the rest
> >
> > if I write
> > q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> > (field2:val2 OR field4:val4)&rows=100&fl=*
> >
> > then what I think I get is two sub-queries that is evaluated separately
> and
> > then joined - performance wise this is bad.
> >
> > Whats the best way to write these types of queries?
> >
> >
> > Are there any performance issues when running it on several solrcloud
> nodes
> > vs a single instance or should it scale?
> >
> >
> >
> > /svante
> >
>