You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/07/27 22:00:18 UTC

An idea for an intersection type of filter query

I've been looking at the slow queries our Solr installation is 
receiving.  They are dominated by queries with a simple q parameter 
(often *:* for all docs) and a VERY complicated fq parameter.  The 
filter query is built by going through a set of rules for the user and 
putting together each rule's query clause separated by OR -- we can't 
easily break it into multiple filters.

In addition to causing queries themselves to run slowly, this causes 
large autowarm times for our filterCache -- my filterCache autowarmCount 
is tiny (4), but it sometimes takes 30 seconds to warm.

I've seen a number of requests here for the ability to have multiple fq 
parameters ORed together.  This is probably possible, but in the 
interests of compatibility between versions, very impractical.  What if 
a new parameter was introduced?  It could be named fqi, for filter query 
intersection.  To figure out the final bitset for multiple fq and fqi 
parameters, it would use this kind of logic:

fq AND fq AND fq AND (fqi OR fqi OR fqi)

This would let us break our filters into manageable pieces that can 
efficiently populate the filterCache, and they would autowarm quickly.

Is the filter design in Solr separated cleanly enough to make this at 
all reasonable?  I'm not a Java developer, so I'd have a tough time 
implementing it myself.  When I have a free moment I will take a look at 
the code anyway.  I'm trying to teach myself Java.

Thanks,
Shawn


Re: An idea for an intersection type of filter query

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/27/2011 2:00 PM, Shawn Heisey wrote:
> I've seen a number of requests here for the ability to have multiple 
> fq parameters ORed together.  This is probably possible, but in the 
> interests of compatibility between versions, very impractical.  What 
> if a new parameter was introduced?  It could be named fqi, for filter 
> query intersection.  To figure out the final bitset for multiple fq 
> and fqi parameters, it would use this kind of logic:
>
> fq AND fq AND fq AND (fqi OR fqi OR fqi)

Thinking about this after I sent it, I realized that I don't mean 
intersection, that's what filter queries already do. :)  I meant union, 
so fqu would be a better parameter name.

Shawn


Re: An idea for an intersection type of filter query

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/31/2011 8:18 PM, Chris Hostetter wrote:
> the syntax isn't really the hard part.  where things get tricky is in the
> internals of th SolrIndexSearcher and SearchHandler so that you cache
> those "fqu" params independently and then union the results, particularly
> when those fq/fqu params need to be part of the cache key for the
> queryResultCache ... a lot of little changes to the internals.
>
> It's been discussed at a high level a sporadically over the years, but no
> one has had the drive/energy/knowledge to dig into the guts and make it
> work.
>
> Having built several custom faceting components over the years (that apply
> special biz rules) i can tell you that generating DocSets and then
> computing unions/intersections is easy and efficient (the
> SolrIndexSearcher/SolrCache/DocSet APIs are really straight forward), but
> anytime you want to then use that DocSet to constrain a DocList ...  you
> run into complications.

Thanks for the reply.  I never assumed implementation would be trivial.  
If it were, someone would have done it already.  Hopefully someone will 
be inspired to figure out the complications and work through them.

When I brought this up last week, I couldn't find a Jira issue 
describing it, so I was considering creating one.  Today I tried a 
different search and managed to locate SOLR-1223.  I've added a small 
note and voted for it.

Shawn


Re: An idea for an intersection type of filter query

Posted by Chris Hostetter <ho...@fucit.org>.
: fq AND fq AND fq AND (fqu OR fqu OR fqu)
: 
: It would be awesome to have a syntax that creates arbitrarily complex and
: nested AND/OR combinations, but that would be a MAJOR undertaking.  The logic
: I've mentioned above seems to be the most useful you could get with just
: having the one additional parameter.  You can get pure union by just using

the syntax isn't really the hard part.  where things get tricky is in the 
internals of th SolrIndexSearcher and SearchHandler so that you cache 
those "fqu" params independently and then union the results, particularly 
when those fq/fqu params need to be part of the cache key for the 
queryResultCache ... a lot of little changes to the internals.

It's been discussed at a high level a sporadically over the years, but no 
one has had the drive/energy/knowledge to dig into the guts and make it 
work.

Having built several custom faceting components over the years (that apply 
special biz rules) i can tell you that generating DocSets and then 
computing unions/intersections is easy and efficient (the 
SolrIndexSearcher/SolrCache/DocSet APIs are really straight forward), but 
anytime you want to then use that DocSet to constrain a DocList ...  you 
run into complications.


-Hoss

Re: An idea for an intersection type of filter query

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/27/2011 3:49 PM, Jonathan Rochkind wrote:
> I don't know the answer to feasibilty either, but I'll just point out 
> that boolean "OR" corresponds to set "union", not set "intersection".  
> So I think you probably mean a 'union' type of filter query; 
> 'intersection' does not seem to describe what you are describing; 
> ordinary 'fq' values are 'intersected' already to restrict the result 
> set, no?

You're right, I noticed that later and corrected myself.  Substitute fqu 
(and try not to pronounce it) for fqi in my previous message.  This is 
the only name suggestion I could come up with on short notice, and it's 
probably a good idea to change it.

> So, anyhow, the basic goal, if I understand it right, is not to 
> provide any additional semantics, but to allow individual clauses in 
> an 'fq' "OR" to be cached and looked up in the filter cache individually.

I would like to have both intersection and union at the same time, not 
be restricted to one or the other, and have it be possible without 
altering existing functionality.  The idea is to just add a new 
parameter that just changes how the resulting bitset is applied to the 
query results.  The filterCache entry would look the same whether you 
used fq or fqu.  Restating my suggested bitset logic with the changed 
parameter name:

fq AND fq AND fq AND (fqu OR fqu OR fqu)

It would be awesome to have a syntax that creates arbitrarily complex 
and nested AND/OR combinations, but that would be a MAJOR undertaking.  
The logic I've mentioned above seems to be the most useful you could get 
with just having the one additional parameter.  You can get pure union 
by just using fqu.  The existing model of pure intersection would be 
maintained when only fq is present.

Thanks,
Shawn


Re: An idea for an intersection type of filter query

Posted by Jonathan Rochkind <ro...@jhu.edu>.
I don't know the answer to feasibilty either, but I'll just point out 
that boolean "OR" corresponds to set "union", not set "intersection".  
So I think you probably mean a 'union' type of filter query; 
'intersection' does not seem to describe what you are describing; 
ordinary 'fq' values are 'intersected' already to restrict the result 
set, no?

So, anyhow, the basic goal, if I understand it right, is not to provide 
any additional semantics, but to allow individual clauses in an 'fq' 
"OR" to be cached and looked up in the filter cache individually.

Perhaps someone (not me) who understands the Solr architecture better 
might also have another suggestion for how to get to that goal, other 
than the specific thing you suggested. I do not know, sorry.

Hmm, but I start thinking, what about a general purpose mechanism to 
identify a sub-clause that should be fetched/retrieved from the filter 
cache. I don't _think_ current nested queries will do that:

fq=_query_:"foo:bar" OR _query_:"foo:baz"

That's legal now (and doesn't accomplish much) -- but what if the 
individual subquery components could consult the filter cache 
seperately?  I don't know if nested query is the right way to do that or 
not, but I'm thinking some mechanism where you could arbitrarily 
identify clauses that should be filter cached independently?

Jonathan

On 7/27/2011 4:00 PM, Shawn Heisey wrote:
> I've been looking at the slow queries our Solr installation is 
> receiving.  They are dominated by queries with a simple q parameter 
> (often *:* for all docs) and a VERY complicated fq parameter.  The 
> filter query is built by going through a set of rules for the user and 
> putting together each rule's query clause separated by OR -- we can't 
> easily break it into multiple filters.
>
> In addition to causing queries themselves to run slowly, this causes 
> large autowarm times for our filterCache -- my filterCache 
> autowarmCount is tiny (4), but it sometimes takes 30 seconds to warm.
>
> I've seen a number of requests here for the ability to have multiple 
> fq parameters ORed together.  This is probably possible, but in the 
> interests of compatibility between versions, very impractical.  What 
> if a new parameter was introduced?  It could be named fqi, for filter 
> query intersection.  To figure out the final bitset for multiple fq 
> and fqi parameters, it would use this kind of logic:
>
> fq AND fq AND fq AND (fqi OR fqi OR fqi)
>
> This would let us break our filters into manageable pieces that can 
> efficiently populate the filterCache, and they would autowarm quickly.
>
> Is the filter design in Solr separated cleanly enough to make this at 
> all reasonable?  I'm not a Java developer, so I'd have a tough time 
> implementing it myself.  When I have a free moment I will take a look 
> at the code anyway.  I'm trying to teach myself Java.
>
> Thanks,
> Shawn
>
>