You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Jonathan Woods <jo...@scintillance.com> on 2007/09/06 22:37:34 UTC

Using a FunctionQuery to exclude hits

As a quick sideline to the discussion on FunctionQueries (which is way
beyond me): would it not be useful for a FunctionQuery to be able to exclude
hits (using that term loosely) as well as to affect their score?  Can/should
the current framework do that?
 
I ask because I want to exclude hits on the basis of their availability,
where 'availability' is to do with user permissions and in general can only
be determined at query time.  I tried writing a Filter to do the same job,
but couldn't find a general way to avoid having the availability criterion
assessed over all documents in the index - and it's prohibitively expensive
to evaluate.
 
Jon

Re: Using a FunctionQuery to exclude hits

Posted by Mike Klaas <mi...@gmail.com>.
On 6-Sep-07, at 2:04 PM, Jonathan Woods wrote:

> The queries are generally BooleanQuery combinations of TermQueries,
> sometimes with output from the vanilla Lucene QueryParser thrown in  
> as one
> of the boolean clauses: "get me all pages about the Senior School  
> && about
> drama [that's the fq bit], hopefully also containing the word
> 'Shakespeare'".  The index is made up using Documents which each  
> correspond
> to a resource in a CMS, so I also index each resource's path (in  
> CMS-space).
>
> At the moment I get the results as an iterator over hits; I wrap  
> that in a
> filtering iterator which gets each Hit's Document's resource path  
> field, and
> uses the CMS to test whether the searching user can access the  
> corresponding
> resource, swallowing the iteration if not.  Effective hit counts  
> are worked
> out the same way.  No performance problems yet, but it feels like a  
> waste!
> The ideal would be if there were a AccessibleToCurrentUserQuery, i.e.
> FunctionQuery where function evaluation happens to involve a CMS
> accessibility check.

A FunctionQuery is mostly for modifying scores/stored vals based on a  
function, not applying a function to the matching problem.

You could create a custom Query class that does the desired matching  
logic, loading the data from a FieldCacheSource, then use it as fq.   
To do it efficiently would require the ability to store all cached  
CMS paths in memory.

-Mike

>> -----Original Message-----
>> From: Yonik Seeley [mailto:yonik@apache.org]
>>
>> On 9/6/07, Jonathan Woods <jo...@scintillance.com> wrote:
>>> As a quick sideline to the discussion on FunctionQueries
>> (which is way
>>> beyond me): would it not be useful for a FunctionQuery to
>> be able to
>>> exclude hits (using that term loosely) as well as to affect their
>>> score?  Can/should the current framework do that?
>>>
>>> I ask because I want to exclude hits on the basis of their
>>> availability, where 'availability' is to do with user
>> permissions and <snip>
>>
>> Can you give examples of what the queries look like (assuming
>> they are expressible in lucene query syntax)?
>>
>> fq params often work pretty well since they are cached and
>> reused (hence the cost of going across the whole index is
>> only paid once).  A FunctionQuery would have some of the same
>> issues... building the FieldCache entry the first time is
>> relatively expensive.
>>
>> -Yonik
>>
>>
>>
>


RE: Using a FunctionQuery to exclude hits

Posted by Jonathan Woods <jo...@scintillance.com>.
The queries are generally BooleanQuery combinations of TermQueries,
sometimes with output from the vanilla Lucene QueryParser thrown in as one
of the boolean clauses: "get me all pages about the Senior School && about
drama [that's the fq bit], hopefully also containing the word
'Shakespeare'".  The index is made up using Documents which each correspond
to a resource in a CMS, so I also index each resource's path (in CMS-space).

At the moment I get the results as an iterator over hits; I wrap that in a
filtering iterator which gets each Hit's Document's resource path field, and
uses the CMS to test whether the searching user can access the corresponding
resource, swallowing the iteration if not.  Effective hit counts are worked
out the same way.  No performance problems yet, but it feels like a waste!
The ideal would be if there were a AccessibleToCurrentUserQuery, i.e.
FunctionQuery where function evaluation happens to involve a CMS
accessibility check.

Going the Filter route at all isn't easy in any case, because my CMS won't
(easily) tell me when user permissions have changed - so I can't tell when
it's necessary to invalidate the cache.

Jon

> -----Original Message-----
> From: Yonik Seeley [mailto:yonik@apache.org] 
> 
> On 9/6/07, Jonathan Woods <jo...@scintillance.com> wrote:
> > As a quick sideline to the discussion on FunctionQueries 
> (which is way 
> > beyond me): would it not be useful for a FunctionQuery to 
> be able to 
> > exclude hits (using that term loosely) as well as to affect their 
> > score?  Can/should the current framework do that?
> >
> > I ask because I want to exclude hits on the basis of their 
> > availability, where 'availability' is to do with user 
> permissions and <snip>
>
> Can you give examples of what the queries look like (assuming 
> they are expressible in lucene query syntax)?
> 
> fq params often work pretty well since they are cached and 
> reused (hence the cost of going across the whole index is 
> only paid once).  A FunctionQuery would have some of the same 
> issues... building the FieldCache entry the first time is 
> relatively expensive.
> 
> -Yonik
> 
> 
> 


Re: Using a FunctionQuery to exclude hits

Posted by Yonik Seeley <yo...@apache.org>.
On 9/6/07, Jonathan Woods <jo...@scintillance.com> wrote:
> As a quick sideline to the discussion on FunctionQueries (which is way
> beyond me): would it not be useful for a FunctionQuery to be able to exclude
> hits (using that term loosely) as well as to affect their score?  Can/should
> the current framework do that?
>
> I ask because I want to exclude hits on the basis of their availability,
> where 'availability' is to do with user permissions and in general can only
> be determined at query time.
>  I tried writing a Filter to do the same job,
> but couldn't find a general way to avoid having the availability criterion
> assessed over all documents in the index - and it's prohibitively expensive
> to evaluate.

Can you give examples of what the queries look like (assuming they are
expressible in lucene query syntax)?

fq params often work pretty well since they are cached and reused
(hence the cost of going across the whole index is only paid once).  A
FunctionQuery would have some of the same issues... building the
FieldCache entry the first time is relatively expensive.

-Yonik