You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by eks dev <ek...@yahoo.co.uk> on 2005/12/23 12:01:52 UTC

Filter to support DocNrSkipper interface

Hi,

Would it be OK to add one method in Filter class that
returns DocNrSkipper interface from Pauls's "Compact
sparse Filter" in jira LUCENE-328

This would be the first step for: 
- smooth integration of compact representations of the
underlaying BitSet in Filter (VInt and sorted int[]).
They are often faster for and/or operations. 
- ChainedFilter (see contrib from Hoss) enhancement
that operates on DocNrSkipper (see And(Or)DocNrSkipper
in Paul's work) 

Compatibility problems do not exist, only BitSet has
to be constructed in bits() method, the same as today
 
The reasoning that justifies effort in this direction
is that distribution of tokens in typical collection
is perfect for these 3 representations of BitVectors
(Very Low freq tokens in sorted int[],  Very HF tokens
in VInt and the rest in BitSet )

To put it another way, Filter forces us to use BitSet,
which is rather inefficient way to store a few
documents from the big collection.

Any feedback appreceated, could easily happen that I
overlooked something essential.

Cheers, e.


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filter to support DocNrSkipper interface

Posted by Morus Walter <mo...@googlemail.com>.
On Fri, 23 Dec 2005 11:01:52 +0000 (GMT)
eks dev <ek...@yahoo.co.uk> wrote:

> To put it another way, Filter forces us to use BitSet,
> which is rather inefficient way to store a few
> documents from the big collection.

I cannot comment on your suggestion, but I think the current  filter
should probably be replaced by a more general solution.
(I haven't had the time to read the lucene mailing for quite some time
so this has probably been discussed before.)

The sort extensions already provide a cached list of document values. So
if I'd like to filter and sort for some field (e.g. date) it would look
much more natural to me to use that list for filtering than create a
list of acceptable documents stored as a bitset or a sparse list. 

So IMHO a filter (as seen by the searcher) should just provide an
interface to query the information if some document is accecpted or
refused by the filter. So the interface would look like

public interface Filter {
	boolean filter(int docno);
}

The current type of bitset based filters could implement this interface
as well as other types of filter.

One could problably go one step further and allow the filter to
modify the score as well (e.g. to allow for a scoring that degrades the
score of a hit depending on it's age).

Actually I just did an extension of IndexSearcher that provides that
kind of filtering for a distance filter based on geographic coordinates.

Of course such a change would not be compatible, so my suggestion goes
far beyond the suggested change.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org