You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by amg qas <am...@gmail.com> on 2011/01/22 20:32:21 UTC

Filter documents on a field value while searching the index

Hi,

I have couple of questions on filtering result set while performing a
search in lucene index :

1) I want to filter the document set returned when searching an index
based on a match on a particular field.
For eg if I have a Field in my index called CompanyName - then while
searching for documents that match
some query I want to restict the result set to only those documents
where CompanyName = 'Foo'. I looked
at the existing Filters in Lucene and could not figure out a way to
accomplish this using any of those filters.

2) Filter result set on a collection of values - for eg in the case
mentioned in 1 can I filter documents
where CompanyName = 'Foo' OR 'Bar' OR 'FooBar'.

3) Filter result set on multiple fields - for eg if my index has the
fields CompanyName & Location -
can I filter documents where CompanyName = 'Foo' & Location = 'Arizona'.

4) Filter result set on multiple fileds on collection of values for
each field - for eg if my index has the fields
CompanyName & Location - can I filter documents where (CompanyName =
'Foo' OR 'Bar') & (Location = 'Arizona'
OR 'Washington').

Thanks,
-amg

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filter documents on a field value while searching the index

Posted by Erick Erickson <er...@gmail.com>.
See below...

On Sat, Jan 22, 2011 at 3:54 PM, amg qas <am...@gmail.com> wrote:

> Hi Eric,
>
> Thanks for the answer.. This does works for me in most cases..
>
> I am actually new to lucene and still getting acquainted to the
> various features exposed by it.
> When I looked at the API of IndexSearcher I thought that it is the
> purpose of the filter class
> to filter the returned result set.
>
> So :
>  public TopDocs search(Weight weight, Filter filter, int nDocs)
> throws IOException {
>
> So my thinking was that there should be some Filter available in
> lucene which would filter out
> the resultset on match on a field.
>
>
What you have to do here is *construct* a filter. This is often accomplished
by walking the terms (via TermDocs) to construct your Filter, then passing
that filter in. The Filter can be cached (CachingWrapperFilter as I
remember)
if desired for later reuse.


> I think the reason this functionality is not there in Lucene is
> because searching a subset of the
> index will be much faster - and this can be specified in BooleanQuery.
>
>
I *think* a filter restricts the documents to be scored, which is, indeed,
faster. But
the cost is that the field filtered on doesn't contribute to relevancy
scoring. Which
may be OK. The question to answer first is whether you care. If simply
adding the
clauses to the query is simpler and fast enough, the added complexity of
creating a filter is debatable. That said, you know your problem space and I
don't....


> However what if I have a use case where I want to score documents on
> all the fields but filter
> them later on on certian fields ? How shoudl I handle this in Lucene ?
>
>
This confuses me. Why do you want to? If you're not going to return the docs
anyway,
why filter them later?

Best
Erick


> -amg
>
> On Sat, Jan 22, 2011 at 12:32 PM, Erick Erickson
> <er...@gmail.com> wrote:
> > I guess I don't see what the problem is. These look to me like
> > standard Lucene query syntax options. If I'm off base here,
> > let me know.
> >
> > If you're building your own BooleanQuery,
> > you can add these as sub-clauses
> >
> > Here's the Lucene query syntax:
> > http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
> >
> > See below
> >
> > Best
> > Erick
> >
> > On Sat, Jan 22, 2011 at 2:32 PM, amg qas <am...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have couple of questions on filtering result set while performing a
> >> search in lucene index :
> >>
> >> 1) I want to filter the document set returned when searching an index
> >> based on a match on a particular field.
> >> For eg if I have a Field in my index called CompanyName - then while
> >> searching for documents that match
> >> some query I want to restict the result set to only those documents
> >> where CompanyName = 'Foo'. I looked
> >> at the existing Filters in Lucene and could not figure out a way to
> >> accomplish this using any of those filters.
> >>
> >>
> > Add a clause +CompanyName:foo
> >
> >
> >> 2) Filter result set on a collection of values - for eg in the case
> >> mentioned in 1 can I filter documents
> >> where CompanyName = 'Foo' OR 'Bar' OR 'FooBar'.
> >>
> >> Add a clause like +CompanyName:(Foo Bar)
> >
> >
> >> 3) Filter result set on multiple fields - for eg if my index has the
> >> fields CompanyName & Location -
> >> can I filter documents where CompanyName = 'Foo' & Location = 'Arizona'.
> >>
> >> +CompanyName:Foo +Location:Arizona
> >
> >
> >> 4) Filter result set on multiple fileds on collection of values for
> >> each field - for eg if my index has the fields
> >> CompanyName & Location - can I filter documents where (CompanyName =
> >> 'Foo' OR 'Bar') & (Location = 'Arizona'
> >> OR 'Washington').
> >>
> >> +CompanyName:(Foo Bar) +Location:(Arizona Washington)
> >
> >
> >> Thanks,
> >> -amg
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Filter documents on a field value while searching the index

Posted by amg qas <am...@gmail.com>.
Hi Eric,

Thanks for the answer.. This does works for me in most cases..

I am actually new to lucene and still getting acquainted to the
various features exposed by it.
When I looked at the API of IndexSearcher I thought that it is the
purpose of the filter class
to filter the returned result set.

So :
  public TopDocs search(Weight weight, Filter filter, int nDocs)
throws IOException {

So my thinking was that there should be some Filter available in
lucene which would filter out
the resultset on match on a field.

I think the reason this functionality is not there in Lucene is
because searching a subset of the
index will be much faster - and this can be specified in BooleanQuery.

However what if I have a use case where I want to score documents on
all the fields but filter
them later on on certian fields ? How shoudl I handle this in Lucene ?

-amg

On Sat, Jan 22, 2011 at 12:32 PM, Erick Erickson
<er...@gmail.com> wrote:
> I guess I don't see what the problem is. These look to me like
> standard Lucene query syntax options. If I'm off base here,
> let me know.
>
> If you're building your own BooleanQuery,
> you can add these as sub-clauses
>
> Here's the Lucene query syntax:
> http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
>
> See below
>
> Best
> Erick
>
> On Sat, Jan 22, 2011 at 2:32 PM, amg qas <am...@gmail.com> wrote:
>
>> Hi,
>>
>> I have couple of questions on filtering result set while performing a
>> search in lucene index :
>>
>> 1) I want to filter the document set returned when searching an index
>> based on a match on a particular field.
>> For eg if I have a Field in my index called CompanyName - then while
>> searching for documents that match
>> some query I want to restict the result set to only those documents
>> where CompanyName = 'Foo'. I looked
>> at the existing Filters in Lucene and could not figure out a way to
>> accomplish this using any of those filters.
>>
>>
> Add a clause +CompanyName:foo
>
>
>> 2) Filter result set on a collection of values - for eg in the case
>> mentioned in 1 can I filter documents
>> where CompanyName = 'Foo' OR 'Bar' OR 'FooBar'.
>>
>> Add a clause like +CompanyName:(Foo Bar)
>
>
>> 3) Filter result set on multiple fields - for eg if my index has the
>> fields CompanyName & Location -
>> can I filter documents where CompanyName = 'Foo' & Location = 'Arizona'.
>>
>> +CompanyName:Foo +Location:Arizona
>
>
>> 4) Filter result set on multiple fileds on collection of values for
>> each field - for eg if my index has the fields
>> CompanyName & Location - can I filter documents where (CompanyName =
>> 'Foo' OR 'Bar') & (Location = 'Arizona'
>> OR 'Washington').
>>
>> +CompanyName:(Foo Bar) +Location:(Arizona Washington)
>
>
>> Thanks,
>> -amg
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filter documents on a field value while searching the index

Posted by Erick Erickson <er...@gmail.com>.
I guess I don't see what the problem is. These look to me like
standard Lucene query syntax options. If I'm off base here,
let me know.

If you're building your own BooleanQuery,
you can add these as sub-clauses

Here's the Lucene query syntax:
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html

See below

Best
Erick

On Sat, Jan 22, 2011 at 2:32 PM, amg qas <am...@gmail.com> wrote:

> Hi,
>
> I have couple of questions on filtering result set while performing a
> search in lucene index :
>
> 1) I want to filter the document set returned when searching an index
> based on a match on a particular field.
> For eg if I have a Field in my index called CompanyName - then while
> searching for documents that match
> some query I want to restict the result set to only those documents
> where CompanyName = 'Foo'. I looked
> at the existing Filters in Lucene and could not figure out a way to
> accomplish this using any of those filters.
>
>
Add a clause +CompanyName:foo


> 2) Filter result set on a collection of values - for eg in the case
> mentioned in 1 can I filter documents
> where CompanyName = 'Foo' OR 'Bar' OR 'FooBar'.
>
> Add a clause like +CompanyName:(Foo Bar)


> 3) Filter result set on multiple fields - for eg if my index has the
> fields CompanyName & Location -
> can I filter documents where CompanyName = 'Foo' & Location = 'Arizona'.
>
> +CompanyName:Foo +Location:Arizona


> 4) Filter result set on multiple fileds on collection of values for
> each field - for eg if my index has the fields
> CompanyName & Location - can I filter documents where (CompanyName =
> 'Foo' OR 'Bar') & (Location = 'Arizona'
> OR 'Washington').
>
> +CompanyName:(Foo Bar) +Location:(Arizona Washington)


> Thanks,
> -amg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>