You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nico Krijnen <ni...@dutchsoftware.com> on 2008/08/05 09:40:13 UTC
folder path prefix filtering
Hello,
Need some help with prefix filtering...
We ran into the max clause count problem with our usage of the
wildcard query. Essentially what we are trying to do is:
One of the fields in our index contains a 'path' representing a file
system location. For example:
/folder A/subfolder/document 1.pdf
/folder B/image 1.jpg
/folder B/image 2.jpg
/folder B2/image 3.jpg
/folder C/image 4.jpg
We have a security layer in our application that filters results based
on the users permissions. These permissions (VIEW, EDIT, ...) can be
set on 'folder paths'. To filter the results we build a bool query
with a wildcard (or prefix) query for each folder for which the user
has VIEW permissions, for example:
/folder A/subfolder/*
/folder B/*
/folder B2/*
This does exactly what we want to, but because a wildcard query is
rewritten to term queries it fails when there are more then 1024
documents below a folder (max clause count of rewritten bool query).
After all, each document has a different (untokenized) term value for
the 'path' field.
After searching the web we found some alternative methods, for example
by using a PrefixFilter wrapped in a CachingWrapperFilter instead of a
query. Before we start implementing I'd like to check if anyone here
may have some more experience with queries like this or may have a
better suggestion on how to approach this?
Kind regards,
Nico Krijnen
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: folder path prefix filtering
Posted by Steven A Rowe <sa...@syr.edu>.
Hi Nico,
On 08/05/2008 at 9:44 AM, Nico Krijnen wrote:
> On 5 aug 2008, at 11:11, Karsten F. wrote:
> > Can't you store only the relevant path in an extra lucene
> > field and set the maximum of query-terms to e.g. 2048 ?
>
> @Karsten: We did think about simplifying permissions to just top-level
> folders, which is probably suitable for 80% of our clients. If the
> filter is too slow we may have to. In that case it gets a lot simpler:
> we can add an extra field for what we call "zone" and use just a term
> query, no need for a prefix or wildcard anymor, and thus no more max
> clause count errors.
Are aware that BooleanQuery.setMaxClauseCount() will raise the max clause count for Wildcard/PrefixQuery's?:
<http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)>
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: folder path prefix filtering
Posted by Nico Krijnen <ni...@dutchsoftware.com>.
Thanks for the replies,
We'll try the filters then, possibly with cache if required for
performance.
@Karsten: We did think about simplifying permissions to just top-level
folders, which is probably suitable for 80% of our clients. If the
filter is too slow we may have to. In that case it gets a lot simpler:
we can add an extra field for what we call "zone" and use just a term
query, no need for a prefix or wildcard anymor, and thus no more max
clause count errors.
Kind regards,
Nico Krijnen
On 5 aug 2008, at 15:31, Erick Erickson wrote:
> This situation is pretty much the kind of thing PrefixFilters
> were written for, so I'd certainly try those first, with or
> without caching. I was surprised at how fast filters
> get constructed, so I'd just try it and take a few measurements.
>
> Best
> Erick
>
On 5 aug 2008, at 11:11, Karsten F. wrote:
>
> Hi Nico Krijnen,
>
> I think it is ok, to store a filter for each user-session im memory.
> And I think that a cached filter is the correct approach for
> permissions.
> (extra memory usage = one bit for each user and each document)
>
> Hopefully someone with more experience will also answer your question.
>
> But I want to ask the obvious question:
>
> Is your permission-policy really on each file, or only on the top-most
> folders?
> Can't you store only the relevant path in an extra lucene field and
> set the
> maximum of query-terms to e.g. 2048 ?
>
> Best regards
> Karsten
> On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen
> <ni...@dutchsoftware.com> wrote:
>
>> Hello,
>>
>> Need some help with prefix filtering...
>> We ran into the max clause count problem with our usage of the
>> wildcard
>> query. Essentially what we are trying to do is:
>>
>> One of the fields in our index contains a 'path' representing a
>> file system
>> location. For example:
>>
>> /folder A/subfolder/document 1.pdf
>> /folder B/image 1.jpg
>> /folder B/image 2.jpg
>> /folder B2/image 3.jpg
>> /folder C/image 4.jpg
>>
>> We have a security layer in our application that filters results
>> based on
>> the users permissions. These permissions (VIEW, EDIT, ...) can be
>> set on
>> 'folder paths'. To filter the results we build a bool query with a
>> wildcard
>> (or prefix) query for each folder for which the user has VIEW
>> permissions,
>> for example:
>>
>> /folder A/subfolder/*
>> /folder B/*
>> /folder B2/*
>>
>> This does exactly what we want to, but because a wildcard query is
>> rewritten to term queries it fails when there are more then 1024
>> documents
>> below a folder (max clause count of rewritten bool query). After
>> all, each
>> document has a different (untokenized) term value for the 'path'
>> field.
>>
>> After searching the web we found some alternative methods, for
>> example by
>> using a PrefixFilter wrapped in a CachingWrapperFilter instead of a
>> query.
>> Before we start implementing I'd like to check if anyone here may
>> have some
>> more experience with queries like this or may have a better
>> suggestion on
>> how to approach this?
>>
>> Kind regards,
>> Nico Krijnen
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: folder path prefix filtering
Posted by Erick Erickson <er...@gmail.com>.
This situation is pretty much the kind of thing PrefixFilters
were written for, so I'd certainly try those first, with or
without caching. I was surprised at how fast filters
get constructed, so I'd just try it and take a few measurements.
Best
Erick
On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen <ni...@dutchsoftware.com> wrote:
> Hello,
>
> Need some help with prefix filtering...
> We ran into the max clause count problem with our usage of the wildcard
> query. Essentially what we are trying to do is:
>
> One of the fields in our index contains a 'path' representing a file system
> location. For example:
>
> /folder A/subfolder/document 1.pdf
> /folder B/image 1.jpg
> /folder B/image 2.jpg
> /folder B2/image 3.jpg
> /folder C/image 4.jpg
>
> We have a security layer in our application that filters results based on
> the users permissions. These permissions (VIEW, EDIT, ...) can be set on
> 'folder paths'. To filter the results we build a bool query with a wildcard
> (or prefix) query for each folder for which the user has VIEW permissions,
> for example:
>
> /folder A/subfolder/*
> /folder B/*
> /folder B2/*
>
> This does exactly what we want to, but because a wildcard query is
> rewritten to term queries it fails when there are more then 1024 documents
> below a folder (max clause count of rewritten bool query). After all, each
> document has a different (untokenized) term value for the 'path' field.
>
> After searching the web we found some alternative methods, for example by
> using a PrefixFilter wrapped in a CachingWrapperFilter instead of a query.
> Before we start implementing I'd like to check if anyone here may have some
> more experience with queries like this or may have a better suggestion on
> how to approach this?
>
> Kind regards,
> Nico Krijnen
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: folder path prefix filtering
Posted by "Karsten F." <ka...@fiz-technik.de>.
Hi Nico Krijnen,
I think it is ok, to store a filter for each user-session im memory.
And I think that a cached filter is the correct approach for permissions.
(extra memory usage = one bit for each user and each document)
Hopefully someone with more experience will also answer your question.
But I want to ask the obvious question:
Is your permission-policy really on each file, or only on the top-most
folders?
Can't you store only the relevant path in an extra lucene field and set the
maximum of query-terms to e.g. 2048 ?
Best regards
Karsten
Nico Krijnen-2 wrote:
>
> Hello,
>
> Need some help with prefix filtering...
>
> After searching the web we found some alternative methods, for example
> by using a PrefixFilter wrapped in a CachingWrapperFilter instead of a
> query. Before we start implementing I'd like to check if anyone here
> may have some more experience with queries like this or may have a
> better suggestion on how to approach this?
>
> Kind regards,
> Nico Krijnen
>
--
View this message in context: http://www.nabble.com/folder-path-prefix-filtering-tp18826094p18827325.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org