You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nico Krijnen <ni...@dutchsoftware.com> on 2008/08/05 09:40:13 UTC

folder path prefix filtering

Hello,

Need some help with prefix filtering...
We ran into the max clause count problem with our usage of the  
wildcard query. Essentially what we are trying to do is:

One of the fields in our index contains a 'path' representing a file  
system location. For example:

/folder A/subfolder/document 1.pdf
/folder B/image 1.jpg
/folder B/image 2.jpg
/folder B2/image 3.jpg
/folder C/image 4.jpg

We have a security layer in our application that filters results based  
on the users permissions. These permissions (VIEW, EDIT, ...) can be  
set on 'folder paths'. To filter the results we build a bool query  
with a wildcard (or prefix) query for each folder for which the user  
has VIEW permissions, for example:

/folder A/subfolder/*
/folder B/*
/folder B2/*

This does exactly what we want to, but because a wildcard query is  
rewritten to term queries it fails when there are more then 1024  
documents below a folder (max clause count of rewritten bool query).  
After all, each document has a different (untokenized) term value for  
the 'path' field.

After searching the web we found some alternative methods, for example  
by using a PrefixFilter wrapped in a CachingWrapperFilter instead of a  
query. Before we start implementing I'd like to check if anyone here  
may have some more experience with queries like this or may have a  
better suggestion on how to approach this?

Kind regards,
Nico Krijnen



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: folder path prefix filtering

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Nico,

On 08/05/2008 at 9:44 AM, Nico Krijnen wrote:
> On 5 aug 2008, at 11:11, Karsten F. wrote:
> > Can't you store only the relevant path in an extra lucene
> > field and set the maximum of query-terms to e.g. 2048 ?
>
> @Karsten: We did think about simplifying permissions to just top-level
> folders, which is probably suitable for 80% of our clients. If the
> filter is too slow we may have to. In that case it gets a lot simpler:
> we can add an extra field for what we call "zone" and use just a term
> query, no need for a prefix or wildcard anymor, and thus no more max
> clause count errors.

Are aware that BooleanQuery.setMaxClauseCount() will raise the max clause count for Wildcard/PrefixQuery's?:

<http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)>

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: folder path prefix filtering

Posted by Nico Krijnen <ni...@dutchsoftware.com>.
Thanks for the replies,

We'll try the filters then, possibly with cache if required for  
performance.

@Karsten: We did think about simplifying permissions to just top-level  
folders, which is probably suitable for 80% of our clients. If the  
filter is too slow we may have to. In that case it gets a lot simpler:  
we can add an extra field for what we call "zone" and use just a term  
query, no need for a prefix or wildcard anymor, and thus no more max  
clause count errors.

Kind regards,
Nico Krijnen

On 5 aug 2008, at 15:31, Erick Erickson wrote:

> This situation is pretty much the kind of thing PrefixFilters
> were written for, so I'd certainly try those first, with or
> without caching. I was surprised at how fast filters
> get constructed, so I'd just try it and take a few measurements.
>
> Best
> Erick
>

On 5 aug 2008, at 11:11, Karsten F. wrote:

>
> Hi Nico Krijnen,
>
> I think it is ok, to store a filter for each user-session im memory.
> And I think that a cached filter is the correct approach for  
> permissions.
> (extra memory usage = one bit for each user and each document)
>
> Hopefully someone with more experience will also answer your question.
>
> But I want to ask the obvious question:
>
> Is your permission-policy really on each file, or only on the top-most
> folders?
> Can't you store only the relevant path in an extra lucene field and  
> set the
> maximum of query-terms to e.g. 2048 ?
>
> Best regards
>  Karsten

> On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen  
> <ni...@dutchsoftware.com> wrote:
>
>> Hello,
>>
>> Need some help with prefix filtering...
>> We ran into the max clause count problem with our usage of the  
>> wildcard
>> query. Essentially what we are trying to do is:
>>
>> One of the fields in our index contains a 'path' representing a  
>> file system
>> location. For example:
>>
>> /folder A/subfolder/document 1.pdf
>> /folder B/image 1.jpg
>> /folder B/image 2.jpg
>> /folder B2/image 3.jpg
>> /folder C/image 4.jpg
>>
>> We have a security layer in our application that filters results  
>> based on
>> the users permissions. These permissions (VIEW, EDIT, ...) can be  
>> set on
>> 'folder paths'. To filter the results we build a bool query with a  
>> wildcard
>> (or prefix) query for each folder for which the user has VIEW  
>> permissions,
>> for example:
>>
>> /folder A/subfolder/*
>> /folder B/*
>> /folder B2/*
>>
>> This does exactly what we want to, but because a wildcard query is
>> rewritten to term queries it fails when there are more then 1024  
>> documents
>> below a folder (max clause count of rewritten bool query). After  
>> all, each
>> document has a different (untokenized) term value for the 'path'  
>> field.
>>
>> After searching the web we found some alternative methods, for  
>> example by
>> using a PrefixFilter wrapped in a CachingWrapperFilter instead of a  
>> query.
>> Before we start implementing I'd like to check if anyone here may  
>> have some
>> more experience with queries like this or may have a better  
>> suggestion on
>> how to approach this?
>>
>> Kind regards,
>> Nico Krijnen
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: folder path prefix filtering

Posted by Erick Erickson <er...@gmail.com>.
This situation is pretty much the kind of thing PrefixFilters
were written for, so I'd certainly try those first, with or
without caching. I was surprised at how fast filters
get constructed, so I'd just try it and take a few measurements.

Best
Erick

On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen <ni...@dutchsoftware.com> wrote:

> Hello,
>
> Need some help with prefix filtering...
> We ran into the max clause count problem with our usage of the wildcard
> query. Essentially what we are trying to do is:
>
> One of the fields in our index contains a 'path' representing a file system
> location. For example:
>
> /folder A/subfolder/document 1.pdf
> /folder B/image 1.jpg
> /folder B/image 2.jpg
> /folder B2/image 3.jpg
> /folder C/image 4.jpg
>
> We have a security layer in our application that filters results based on
> the users permissions. These permissions (VIEW, EDIT, ...) can be set on
> 'folder paths'. To filter the results we build a bool query with a wildcard
> (or prefix) query for each folder for which the user has VIEW permissions,
> for example:
>
> /folder A/subfolder/*
> /folder B/*
> /folder B2/*
>
> This does exactly what we want to, but because a wildcard query is
> rewritten to term queries it fails when there are more then 1024 documents
> below a folder (max clause count of rewritten bool query). After all, each
> document has a different (untokenized) term value for the 'path' field.
>
> After searching the web we found some alternative methods, for example by
> using a PrefixFilter wrapped in a CachingWrapperFilter instead of a query.
> Before we start implementing I'd like to check if anyone here may have some
> more experience with queries like this or may have a better suggestion on
> how to approach this?
>
> Kind regards,
> Nico Krijnen
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: folder path prefix filtering

Posted by "Karsten F." <ka...@fiz-technik.de>.
Hi Nico Krijnen,

I think it is ok, to store a filter for each user-session im memory.
And I think that a cached filter is the correct approach for permissions.
(extra memory usage = one bit for each user and each document)

Hopefully someone with more experience will also answer your question.

But I want to ask the obvious question:

Is your permission-policy really on each file, or only on the top-most
folders?
Can't you store only the relevant path in an extra lucene field and set the
maximum of query-terms to e.g. 2048 ?
 
Best regards
  Karsten


Nico Krijnen-2 wrote:
> 
> Hello,
> 
> Need some help with prefix filtering...
> 
> After searching the web we found some alternative methods, for example  
> by using a PrefixFilter wrapped in a CachingWrapperFilter instead of a  
> query. Before we start implementing I'd like to check if anyone here  
> may have some more experience with queries like this or may have a  
> better suggestion on how to approach this?
> 
> Kind regards,
> Nico Krijnen
> 

-- 
View this message in context: http://www.nabble.com/folder-path-prefix-filtering-tp18826094p18827325.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org