You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by svante karlsson <sa...@csi.se> on 2014/01/23 12:42:25 UTC

how to write an efficient query with a subquery to restrict the search space?

I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.


my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)&rows=100&fl=*

but what I think I get is
.....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)&rows=100&fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?


Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?



/svante

Re: how to write an efficient query with a subquery to restrict the search space?

Posted by Raymond Wiker <rw...@gmail.com>.

Maybe you could move (field2:val2 or field4:val4) into a filter? E.g,

q=(field1:val1 OR field2:val2 OR field3:val3 OR
field4:val4)&fq=(field2:val2 OR field4:val4)

If I have this correctly, the fq part should be evaluated first, and may
even be found in the filter cache.



On Thu, Jan 23, 2014 at 12:42 PM, svante karlsson <sa...@csi.se> wrote:

> I have a solr db containing 1 billion records that I'm trying to use in a
> NoSQL fashion.
>
> What I want to do is find the best matches using all search terms but
> restrict the search space to the most unique terms
>
> In this example I know that val2 and val4 is rare terms and val1 and val3
> are more common. In my real scenario I'll have 20 fields that I want to
> include or exclude in the inner query depending on the uniqueness of the
> requested value.
>
>
> my first approach was:
> q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
> OR field4:val4)&rows=100&fl=*
>
> but what I think I get is
> .....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> OR'ed with the rest
>
> if I write
> q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> (field2:val2 OR field4:val4)&rows=100&fl=*
>
> then what I think I get is two sub-queries that is evaluated separately and
> then joined - performance wise this is bad.
>
> Whats the best way to write these types of queries?
>
>
> Are there any performance issues when running it on several solrcloud nodes
> vs a single instance or should it scale?
>
>
>
> /svante
>

Re: how to write an efficient query with a subquery to restrict the search space?

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

Sounds like a possible document and query routing use case.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 31, 2014 7:11 AM, "svante karlsson" <sa...@csi.se> wrote:

> It seems to be faster to first restrict the search space and then do the
> scoring compared to just use the full query and let solr handle everything.
>
> For example in my application one of the scoring fields effectivly hits
> 1/12 of the database (a month field) and if we have 100'' items in the
> database the this matters.
>
> /svante
>
>
> 2014-01-30 Jack Krupansky <ja...@basetechnology.com>:
>
> > Lucene's default scoring should give you much of what you want - ranking
> > hits of low-frequency terms higher - without any special query syntax -
> > just list out your terms and use "OR" as your default operator.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: svante karlsson
> > Sent: Thursday, January 23, 2014 6:42 AM
> > To: solr-user@lucene.apache.org
> > Subject: how to write an efficient query with a subquery to restrict the
> > search space?
> >
> >
> > I have a solr db containing 1 billion records that I'm trying to use in a
> > NoSQL fashion.
> >
> > What I want to do is find the best matches using all search terms but
> > restrict the search space to the most unique terms
> >
> > In this example I know that val2 and val4 is rare terms and val1 and val3
> > are more common. In my real scenario I'll have 20 fields that I want to
> > include or exclude in the inner query depending on the uniqueness of the
> > requested value.
> >
> >
> > my first approach was:
> > q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND
> (field2:val2
> > OR field4:val4)&rows=100&fl=*
> >
> > but what I think I get is
> > .....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> > OR'ed with the rest
> >
> > if I write
> > q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> > (field2:val2 OR field4:val4)&rows=100&fl=*
> >
> > then what I think I get is two sub-queries that is evaluated separately
> and
> > then joined - performance wise this is bad.
> >
> > Whats the best way to write these types of queries?
> >
> >
> > Are there any performance issues when running it on several solrcloud
> nodes
> > vs a single instance or should it scale?
> >
> >
> >
> > /svante
> >
>

Re: how to write an efficient query with a subquery to restrict the search space?

Posted by svante karlsson <sa...@csi.se>.

It seems to be faster to first restrict the search space and then do the
scoring compared to just use the full query and let solr handle everything.

For example in my application one of the scoring fields effectivly hits
1/12 of the database (a month field) and if we have 100'' items in the
database the this matters.

/svante


2014-01-30 Jack Krupansky <ja...@basetechnology.com>:

> Lucene's default scoring should give you much of what you want - ranking
> hits of low-frequency terms higher - without any special query syntax -
> just list out your terms and use "OR" as your default operator.
>
> -- Jack Krupansky
>
> -----Original Message----- From: svante karlsson
> Sent: Thursday, January 23, 2014 6:42 AM
> To: solr-user@lucene.apache.org
> Subject: how to write an efficient query with a subquery to restrict the
> search space?
>
>
> I have a solr db containing 1 billion records that I'm trying to use in a
> NoSQL fashion.
>
> What I want to do is find the best matches using all search terms but
> restrict the search space to the most unique terms
>
> In this example I know that val2 and val4 is rare terms and val1 and val3
> are more common. In my real scenario I'll have 20 fields that I want to
> include or exclude in the inner query depending on the uniqueness of the
> requested value.
>
>
> my first approach was:
> q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
> OR field4:val4)&rows=100&fl=*
>
> but what I think I get is
> .....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> OR'ed with the rest
>
> if I write
> q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> (field2:val2 OR field4:val4)&rows=100&fl=*
>
> then what I think I get is two sub-queries that is evaluated separately and
> then joined - performance wise this is bad.
>
> Whats the best way to write these types of queries?
>
>
> Are there any performance issues when running it on several solrcloud nodes
> vs a single instance or should it scale?
>
>
>
> /svante
>

Re: how to write an efficient query with a subquery to restrict the search space?

Posted by Jack Krupansky <ja...@basetechnology.com>.

Lucene's default scoring should give you much of what you want - ranking 
hits of low-frequency terms higher - without any special query syntax - just 
list out your terms and use "OR" as your default operator.

-- Jack Krupansky

-----Original Message----- 
From: svante karlsson
Sent: Thursday, January 23, 2014 6:42 AM
To: solr-user@lucene.apache.org
Subject: how to write an efficient query with a subquery to restrict the 
search space?

I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.

my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)&rows=100&fl=*

but what I think I get is
.....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)&rows=100&fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?

Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?

/svante