You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Antony Bowesman <ad...@thorntothehorn.org> on 2011/04/19 07:29:59 UTC

Filters with 2.9.4

Hi,

Another migrate to 2.9.4 issue for me...

When a search is done by a user, I collect a 'DocSet' of Documents for that 
'owner'  (Term("id", "XX)).  This is a single set for all Documents in the index 
and NOT per reader.

Then when searches are made I use caching Filters, but I use my master DocSet as 
a Filter for those chained Filters.  However, with 2.9, Filters are now called 
per segment reader and there's a DocIdSet for each Reader.  There is no way for 
the filter implementation to know the docBase for the passed reader, like the 
collector does.

As the Javadocs for Filter.getDocIdSet imply, a Filter must only return doc ids 
for the given reader.

I am now stuck with a filter implementation that can no longer interset the 
master bitset for my 'owners'.

Was this envisaged during the changes and is there a way I can get hold of the 
docBase for an IndexReader.

Thanks
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Filters with 2.9.4

Posted by Antony Bowesman <ad...@thorntothehorn.org>.
Thanks Uwe.  I'll work towards the CachingWrapperFilter.
Antony


On 27/04/2011 9:33 PM, Uwe Schindler wrote:
> Hi,
>
> In Lucene trunk the Filter gets a ReaderContext which contain a doc base if
> available.
>
> For Lucene 2 and 3 this is not available. The Lucene 2.9 code did not change
> documented behavior. The fact that Filters always got the top level reader
> was never documented (it was just like that in early Lucene versions) and so
> is no break. The same applies not only to filters, it also applies to
> Scorers created by Queries. Those also don't know anything about the
> top-level searcher (and they don't need). For a filter to work this is also
> not an requirement - the IndexReader passed as parameter is self contained
> and provides all information for processing the current segment). You should
> simply fix your caching (which is much more effective after this change, as
> the cache items don't get invalid after a reopen of an index where only few
> segments changed).
>
> I would suggest to correct your code and use CachingWrapperFilter.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Filters with 2.9.4

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

In Lucene trunk the Filter gets a ReaderContext which contain a doc base if
available.

For Lucene 2 and 3 this is not available. The Lucene 2.9 code did not change
documented behavior. The fact that Filters always got the top level reader
was never documented (it was just like that in early Lucene versions) and so
is no break. The same applies not only to filters, it also applies to
Scorers created by Queries. Those also don't know anything about the
top-level searcher (and they don't need). For a filter to work this is also
not an requirement - the IndexReader passed as parameter is self contained
and provides all information for processing the current segment). You should
simply fix your caching (which is much more effective after this change, as
the cache items don't get invalid after a reopen of an index where only few
segments changed).

I would suggest to correct your code and use CachingWrapperFilter.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Antony Bowesman [mailto:adb@thorntothehorn.org]
> Sent: Wednesday, April 27, 2011 1:22 PM
> To: dev@lucene.apache.org
> Subject: Re: Filters with 2.9.4
> 
> Hi Uwe,
> 
> Thanks for the reply.
> 
> Things are a bit tangled, because I've used early Solr stuff with DocSet
and
> have extensively used my own caching Filters because I couldn't get what I
> wanted with the standard versions a few years ago.  It will take a while
to
> undo that, but I'm working towards that.
> 
> However, it still seems to me that the Filter.getDocIdSet() method should
> also be given the docBase for the given reader.  It seems odd that the
> Collector has that knowledge but the Filter does not even though they are
> pretty closely related classes.
> 
> What do you think?
> Antony
> 
> 
> 
> On 19/04/2011 5:01 PM, Uwe Schindler wrote:
> > Hi Antony,
> >
> > Why not use CachingWrapperFilter together with a TermsFilter or
> > QueryWrapperFilter(TermQuery)? This Filter keeps track of all used
> > segment readers. So you build an instance:
> >   Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new
> > TermQuery(new Term(...))));
> >
> > And reuse that filter instance with all queries, the user starts. No
> > need to hack the cache yourself. The above variant is much more
> > effective as it works better with reopen()'ed index readers (after
> > index changed), because it reuses the unchanged segment readers.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Antony Bowesman [mailto:adb@thorntothehorn.org]
> >> Sent: Tuesday, April 19, 2011 7:30 AM
> >> To: Lucene Dev
> >> Subject: Filters with 2.9.4
> >>
> >> Hi,
> >>
> >> Another migrate to 2.9.4 issue for me...
> >>
> >> When a search is done by a user, I collect a 'DocSet' of Documents
> >> for
> > that
> >> 'owner'  (Term("id", "XX)).  This is a single set for all Documents
> >> in the
> > index
> >> and NOT per reader.
> >>
> >> Then when searches are made I use caching Filters, but I use my
> >> master DocSet as a Filter for those chained Filters.  However, with
> >> 2.9, Filters
> > are
> >> now called per segment reader and there's a DocIdSet for each Reader.
> >> There is no way for the filter implementation to know the docBase for
> >> the passed reader, like the collector does.
> >>
> >> As the Javadocs for Filter.getDocIdSet imply, a Filter must only
> >> return
> > doc ids
> >> for the given reader.
> >>
> >> I am now stuck with a filter implementation that can no longer
> >> interset
> > the
> >> master bitset for my 'owners'.
> >>
> >> Was this envisaged during the changes and is there a way I can get
> >> hold of the docBase for an IndexReader.
> >>
> >> Thanks
> >> Antony
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Filters with 2.9.4

Posted by Antony Bowesman <ad...@thorntothehorn.org>.
Hi Uwe,

Thanks for the reply.

Things are a bit tangled, because I've used early Solr stuff with DocSet and 
have extensively used my own caching Filters because I couldn't get what I 
wanted with the standard versions a few years ago.  It will take a while to undo 
that, but I'm working towards that.

However, it still seems to me that the Filter.getDocIdSet() method should also 
be given the docBase for the given reader.  It seems odd that the Collector has 
that knowledge but the Filter does not even though they are pretty closely 
related classes.

What do you think?
Antony



On 19/04/2011 5:01 PM, Uwe Schindler wrote:
> Hi Antony,
>
> Why not use CachingWrapperFilter together with a TermsFilter or
> QueryWrapperFilter(TermQuery)? This Filter keeps track of all used segment
> readers. So you build an instance:
>   Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new
> TermQuery(new Term(...))));
>
> And reuse that filter instance with all queries, the user starts. No need to
> hack the cache yourself. The above variant is much more effective as it
> works better with reopen()'ed index readers (after index changed), because
> it reuses the unchanged segment readers.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Antony Bowesman [mailto:adb@thorntothehorn.org]
>> Sent: Tuesday, April 19, 2011 7:30 AM
>> To: Lucene Dev
>> Subject: Filters with 2.9.4
>>
>> Hi,
>>
>> Another migrate to 2.9.4 issue for me...
>>
>> When a search is done by a user, I collect a 'DocSet' of Documents for
> that
>> 'owner'  (Term("id", "XX)).  This is a single set for all Documents in the
> index
>> and NOT per reader.
>>
>> Then when searches are made I use caching Filters, but I use my master
>> DocSet as a Filter for those chained Filters.  However, with 2.9, Filters
> are
>> now called per segment reader and there's a DocIdSet for each Reader.
>> There is no way for the filter implementation to know the docBase for the
>> passed reader, like the collector does.
>>
>> As the Javadocs for Filter.getDocIdSet imply, a Filter must only return
> doc ids
>> for the given reader.
>>
>> I am now stuck with a filter implementation that can no longer interset
> the
>> master bitset for my 'owners'.
>>
>> Was this envisaged during the changes and is there a way I can get hold of
>> the docBase for an IndexReader.
>>
>> Thanks
>> Antony

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Filters with 2.9.4

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Antony,

Why not use CachingWrapperFilter together with a TermsFilter or
QueryWrapperFilter(TermQuery)? This Filter keeps track of all used segment
readers. So you build an instance:
 Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new
TermQuery(new Term(...))));

And reuse that filter instance with all queries, the user starts. No need to
hack the cache yourself. The above variant is much more effective as it
works better with reopen()'ed index readers (after index changed), because
it reuses the unchanged segment readers.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Antony Bowesman [mailto:adb@thorntothehorn.org]
> Sent: Tuesday, April 19, 2011 7:30 AM
> To: Lucene Dev
> Subject: Filters with 2.9.4
> 
> Hi,
> 
> Another migrate to 2.9.4 issue for me...
> 
> When a search is done by a user, I collect a 'DocSet' of Documents for
that
> 'owner'  (Term("id", "XX)).  This is a single set for all Documents in the
index
> and NOT per reader.
> 
> Then when searches are made I use caching Filters, but I use my master
> DocSet as a Filter for those chained Filters.  However, with 2.9, Filters
are
> now called per segment reader and there's a DocIdSet for each Reader.
> There is no way for the filter implementation to know the docBase for the
> passed reader, like the collector does.
> 
> As the Javadocs for Filter.getDocIdSet imply, a Filter must only return
doc ids
> for the given reader.
> 
> I am now stuck with a filter implementation that can no longer interset
the
> master bitset for my 'owners'.
> 
> Was this envisaged during the changes and is there a way I can get hold of
> the docBase for an IndexReader.
> 
> Thanks
> Antony
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org