You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ben Rooney <be...@blastradius.com> on 2004/12/07 21:06:28 UTC

QueryFilter vs CachingWrapperFilter vs RangeQuery

hello, hope someone can help explain things to me. 

i've been searching for sometime and i have not been able to find
anything to answer my questions.

i'm trying to understand the difference/effects between QueryFilter vs
CachingWrapperFilter and when you would use one vs the other and how
they work exactly.  

also, when exactly will the cache be cleared.  looking at the source
code, it appears when the IndexReader is released it would be cleared.
does this mean i should keep a reference to the SearchIndexer until i
want the results to be cleared?  for example, in a class file the
executes the search, i would keep a static reference to SearchIndexer
and then when i want to invalidate the cache, set it to null or create a
new instance of it?

on top of this, using the RangeQuery object in a search does not seem to
be prudent as the time is almost 4 times that of using a filter.  i
basically can dig on this as when doing a query, lucene needs to do
scoring for all the documents that match where as using a filter it
ignores scoring.

to test them out, i created an index against a 20000 document repository
where the files in the repository are simply properties files.  in the
properties files, i set the publishDate property so that all documents
are of year 2004.

my test runs 4 queries.  the first test is a basic one that returns all
documents in the index that contains the word 'document'.  the second
test adds the query from the first test to a BooleanQuery along with a
RangeQuery for the year 2004.  the third test uses the query from the
first test along with QueryFilter constructed using the RangeQuery.  the
final test is the same as the third query but the QueryFilter is wrapped
in a CachingWrapperFilter class.  each test runs a search against the
index 100 times with the same configuration.

the output from my test is as follows:


        2004-12-07 20:30:03,888 DEBUG (SearchManager.java:
        main:138) - 20000 total matching documents
        2004-12-07 20:30:04,602 INFO  (SearchManager.java:
        main:141) - query 1 - all docs - total time (ms): 768
        2004-12-07 20:30:04,653 DEBUG (SearchManager.java:
        main:146) - 20000 total matching documents
        2004-12-07 20:30:06,598 INFO  (SearchManager.java:
        main:149) - query 2 - 2004 range query - no cache - total time
        (ms): 1996
        2004-12-07 20:30:06,614 DEBUG (SearchManager.java:
        main:155) - 20000 total matching documents
        2004-12-07 20:30:07,223 INFO  (SearchManager.java:
        main:158) - query 3 - 2004 docs filter - no cache - total time
        (ms): 623
        2004-12-07 20:30:07,230 DEBUG (SearchManager.java:
        main:164) - 20000 total matching documents
        2004-12-07 20:30:07,838 INFO  (SearchManager.java:
        main:167) - query 4 - 2004 docs filter - cached - total time
        (ms): 613


as can be seen, there is not much different between the third and fourth
queries and hence my confusion with the two types of filters.  looking
at the source code, there is not much different between them either.

the following is the test source code:


        package com.blastradius.search;
        
        import java.io.File;
        import java.util.Date;
        
        import org.apache.commons.logging.Log;
        import org.apache.commons.logging.LogFactory;
        import org.apache.lucene.analysis.Analyzer;
        import org.apache.lucene.analysis.standard.StandardAnalyzer;
        import org.apache.lucene.document.Document;
        import org.apache.lucene.index.IndexWriter;
        import org.apache.lucene.index.Term;
        import org.apache.lucene.queryParser.QueryParser;
        import org.apache.lucene.search.BooleanQuery;
        import org.apache.lucene.search.CachingWrapperFilter;
        import org.apache.lucene.search.Hits;
        import org.apache.lucene.search.IndexSearcher;
        import org.apache.lucene.search.Query;
        import org.apache.lucene.search.QueryFilter;
        import org.apache.lucene.search.RangeQuery;
        import org.apache.lucene.search.Searcher;
        
        import com.blastradius.search.parsers.PropertiesParser;
        
        /**
        * 
        * @author brooney
        */
        public class SearchManager {
        
        public final static String INDEX_DIR = "index";
        public final static String ROOT_DIR = "webroot";
        
        public final static File rootDir = new
        File(SearchManager.ROOT_DIR); 
        private final static Log logger =
        LogFactory.getLog(SearchManager.class);
        
        public static void main(String[] args) {
        
        Date start = null;
        Date end = null;
        Hits hits = null;
        
        try {
        Searcher searcher = new IndexSearcher(SearchManager.INDEX_DIR);
        Analyzer analyzer = new StandardAnalyzer();
        
        Query query = QueryParser.parse("document", "contents",
        analyzer);
        Query rangeQuery = new RangeQuery(new Term("publishDate",
        "20040101"), new Term("publishDate", "20041231"), true);
        
        BooleanQuery query2004 = new BooleanQuery();
        query2004.add(query, true, false);
        query2004.add(rangeQuery, true, false);
        
        start = new Date();
        for (int i = 0; i < 100; i++) {
        hits = searcher.search(query);
        if (i == 0) logger.debug(hits.length() + " total matching 
        documents");
        }
        end = new Date();
        logger.info("query 1 - all docs - total time (ms): " +
        (end.getTime() - start.getTime()));
        
        start = new Date();
        for (int i = 0; i < 100; i++) {
        hits = searcher.search(query2004);
        if (i == 0) logger.debug(hits.length() + " total matching
        documents");
        }
        end = new Date();
        logger.info("query 2 - 2004 range query - no cache - total time
        (ms): " + (end.getTime() - start.getTime()));
        
        QueryFilter filter2004 = new QueryFilter(rangeQuery);
        start = new Date();
        for (int i = 0; i < 100; i++) {
        hits = searcher.search(query, filter2004);
        if (i == 0) logger.debug(hits.length() + " total matching
        documents");
        }
        end = new Date();
        logger.info("query 3 - 2004 docs filter - no cache - total time
        (ms): " + (end.getTime() - start.getTime()));
        
        CachingWrapperFilter cache2004 = new
        CachingWrapperFilter(filter2004);
        start = new Date();
        for (int i = 0; i < 100; i++) {
        hits = searcher.search(query, cache2004);
        if (i == 0) logger.debug(hits.length() + " total matching
        documents");
        }
        end = new Date();
        logger.info("query 4 - 2004 docs filter - cached - total time
        (ms): " + (end.getTime() - start.getTime()));
        
        } catch (Exception e) {
        logger.error("unexpected excpetion trying to execute search",
        e);
        }
        
        }
        }



thanks in advance for any help
ben

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: QueryFilter vs CachingWrapperFilter vs RangeQuery

Posted by Ben Rooney <be...@blastradius.com>.

erik, thanks for the reply

i get the filter know and understand how the caching works.  however the
caching is only on the filtering level which means i can cache results
that are filtered.  but if i do a basic search against the index and
want to cache that, do i need to create my own caching mechanism or does
the SearchIndexer cache the results already?  if it caches them already,
then to clear the cache, is it again removing any references to the
SearchIndexer instance?

thanks again,
ben


On Tue, 2004-07-12 at 15:18 -0500, Erik Hatcher wrote:

> On Dec 7, 2004, at 3:06 PM, Ben Rooney wrote:
> > i'm trying to understand the difference/effects between QueryFilter vs
> > CachingWrapperFilter and when you would use one vs the other and how
> > they work exactly.
> 
> QueryFilter caches the results (bit set of documents) of a query by 
> IndexReader.
> 
> CachingWrapperFilter does not actually do any filtering of its own, but 
> merely wraps the results of another non-caching filter, such as 
> DateFilter.  CachingWrapperFilter was added to disconnect caching from 
> filtering.  QueryFilter is the exception as it came first and already 
> does caching.  If you're using QueryFilter, there is no need to concern 
> yourself with CachingWrapperFilter.
> 
> > also, when exactly will the cache be cleared.  looking at the source
> > code, it appears when the IndexReader is released it would be cleared.
> > does this mean i should keep a reference to the SearchIndexer until i
> > want the results to be cleared?  for example, in a class file the
> > executes the search, i would keep a static reference to SearchIndexer
> > and then when i want to invalidate the cache, set it to null or create 
> > a
> > new instance of it?
> 
> How you keep a reference to the IndexSearcher instance is up to the 
> design of your system.  But, yes, you do need to keep a reference to it 
> for the cache to work properly.  If you use a new IndexSearcher 
> instance (I'm simplifying here, you could have an IndexReader instance 
> yourself too, but I'm ignoring that possibility) then the filtering 
> process occurs for each search rather than using the cache.
> 
> 	Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Re: Empty/non-empty field indexing question

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Correct.
No, there is no point in putting an empty field there.

Otis

--- "amigo@max3d.com" <am...@max3d.com> wrote:

> Hi Otis
> 
> What kind of implications does that produce on the search?
> 
> If I understand correctly that record would not be searched for if
> the 
> field is not there, correct?
> But then is there a point putting an empty value in it, if an 
> application will never search for empty values?
> 
> 
> thanks
> 
> -pedja
> 
> 
> Otis Gospodnetic said the following on 12/8/2004 1:31 AM:
> 
> >Empty fields won't add any value, you can skip them.  Documents in
> an
> >index don't have to be uniform.  Each Document could have a
> different
> >set of fields.  Of course, that has some obvious implications for
> >search, but is perfectly fine technically.
> >
> >Otis
> >
> >--- "amigo@max3d.com" <am...@max3d.com> wrote:
> >
> >  
> >
> >>Here's probably a silly question, very newbish, but I had to ask.
> >>Since I have mysql documents that contain over 30 fields each and
> >>most of them
> >>are added to the index, is it a common practice to add fields to
> the
> >>index with 
> >>empty values, for that perticular record, or should the field be
> >>totally omitted.
> >>
> >>What I mean is if let's say a Title field is empty on a specific
> >>record (in mysql)
> >>should I still add that field into Lucene index with an empty value
> >>or just
> >>skip it and only add the fields that contain non-empty values?
> >>
> >>thanks
> >>
> >>-pedja
> >>
> >>
> >>
> >>
>
>>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>
> >>
> >>    
> >>
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >  
> >
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Empty/non-empty field indexing question

Posted by "amigo@max3d.com" <am...@max3d.com>.

Hi Otis

What kind of implications does that produce on the search?

If I understand correctly that record would not be searched for if the 
field is not there, correct?
But then is there a point putting an empty value in it, if an 
application will never search for empty values?


thanks

-pedja


Otis Gospodnetic said the following on 12/8/2004 1:31 AM:

>Empty fields won't add any value, you can skip them.  Documents in an
>index don't have to be uniform.  Each Document could have a different
>set of fields.  Of course, that has some obvious implications for
>search, but is perfectly fine technically.
>
>Otis
>
>--- "amigo@max3d.com" <am...@max3d.com> wrote:
>
>  
>
>>Here's probably a silly question, very newbish, but I had to ask.
>>Since I have mysql documents that contain over 30 fields each and
>>most of them
>>are added to the index, is it a common practice to add fields to the
>>index with 
>>empty values, for that perticular record, or should the field be
>>totally omitted.
>>
>>What I mean is if let's say a Title field is empty on a specific
>>record (in mysql)
>>should I still add that field into Lucene index with an empty value
>>or just
>>skip it and only add the fields that contain non-empty values?
>>
>>thanks
>>
>>-pedja
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

Re: Empty/non-empty field indexing question

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Empty fields won't add any value, you can skip them.  Documents in an
index don't have to be uniform.  Each Document could have a different
set of fields.  Of course, that has some obvious implications for
search, but is perfectly fine technically.

Otis

--- "amigo@max3d.com" <am...@max3d.com> wrote:

> Here's probably a silly question, very newbish, but I had to ask.
> Since I have mysql documents that contain over 30 fields each and
> most of them
> are added to the index, is it a common practice to add fields to the
> index with 
> empty values, for that perticular record, or should the field be
> totally omitted.
> 
> What I mean is if let's say a Title field is empty on a specific
> record (in mysql)
> should I still add that field into Lucene index with an empty value
> or just
> skip it and only add the fields that contain non-empty values?
> 
> thanks
> 
> -pedja
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Empty/non-empty field indexing question

Posted by "amigo@max3d.com" <am...@max3d.com>.

Here's probably a silly question, very newbish, but I had to ask.
Since I have mysql documents that contain over 30 fields each and most of them
are added to the index, is it a common practice to add fields to the index with 
empty values, for that perticular record, or should the field be totally omitted.

What I mean is if let's say a Title field is empty on a specific record (in mysql)
should I still add that field into Lucene index with an empty value or just
skip it and only add the fields that contain non-empty values?

thanks

-pedja




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Weird Behavior On Windows

Posted by Luke Shannon <ls...@hypermedia.com>.

Hey Ottis;

You're right again. Turned out there was a exception around the usage of the
Digester class that wasn't being written to the log. This exception was
being thrown as a result of a configuration issue with the server.

Everything is back to normal.

Thanks!

Luke
----- Original Message ----- 
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Tuesday, December 07, 2004 6:27 PM
Subject: Re: Weird Behavior On Windows


> The index has been modified, so you need a new IndexSearcher.  Could
> there be logic in the flaw (swap that), or could you be catching an
> Exception that is thrown only on Winblows due to Windows not letting
> you do certain things with referenced files and dirs?
>
> Otis
>
> --- Luke Shannon <ls...@hypermedia.com> wrote:
>
> > Hello All;
> >
> > Things have been running smoothly on Linux for sometime. We set up a
> > version
> > of the site on a Win2K machine, this is when all the "fun" started.
> >
> > A pdf would be added to the system. The indexer would run, find the
> > new
> > file, index it and successfully complete the update of the index
> > folder. No
> > IO error, no errors of any kind. Just like on the Linux box.
> >
> > Now we would try to search for a term in the document. 0 results
> > would be
> > returned? To make matters worse if I run a search on a term that
> > shows up in
> > a bunch of documents on windows it only find 2 results, where in
> > Linux it
> > would find 50 (same content).
> >
> > Using "Luke" I was able to verify that the pdf in question is in the
> > index.
> > Why can't the searcher find it?
> >
> > Any ideas would be welcome.
> >
> > Luke
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Weird Behavior On Windows

Posted by Luke Shannon <ls...@hypermedia.com>.

Hi Otis;

Each time a search request comes in I create a new searcher (same analyzer
as used during indexing). The idea about catching an error somewhere is
interesting, although in most of the cases where I catch an exception I
write to a log file. Anyway, this is all I have to gone on so I am looking
into exceptions now...

Luke
----- Original Message ----- 
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Tuesday, December 07, 2004 6:27 PM
Subject: Re: Weird Behavior On Windows


> The index has been modified, so you need a new IndexSearcher.  Could
> there be logic in the flaw (swap that), or could you be catching an
> Exception that is thrown only on Winblows due to Windows not letting
> you do certain things with referenced files and dirs?
>
> Otis
>
> --- Luke Shannon <ls...@hypermedia.com> wrote:
>
> > Hello All;
> >
> > Things have been running smoothly on Linux for sometime. We set up a
> > version
> > of the site on a Win2K machine, this is when all the "fun" started.
> >
> > A pdf would be added to the system. The indexer would run, find the
> > new
> > file, index it and successfully complete the update of the index
> > folder. No
> > IO error, no errors of any kind. Just like on the Linux box.
> >
> > Now we would try to search for a term in the document. 0 results
> > would be
> > returned? To make matters worse if I run a search on a term that
> > shows up in
> > a bunch of documents on windows it only find 2 results, where in
> > Linux it
> > would find 50 (same content).
> >
> > Using "Luke" I was able to verify that the pdf in question is in the
> > index.
> > Why can't the searcher find it?
> >
> > Any ideas would be welcome.
> >
> > Luke
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Weird Behavior On Windows

Posted by Otis Gospodnetic <ot...@yahoo.com>.

The index has been modified, so you need a new IndexSearcher.  Could
there be logic in the flaw (swap that), or could you be catching an
Exception that is thrown only on Winblows due to Windows not letting
you do certain things with referenced files and dirs?

Otis

--- Luke Shannon <ls...@hypermedia.com> wrote:

> Hello All;
> 
> Things have been running smoothly on Linux for sometime. We set up a
> version
> of the site on a Win2K machine, this is when all the "fun" started.
> 
> A pdf would be added to the system. The indexer would run, find the
> new
> file, index it and successfully complete the update of the index
> folder. No
> IO error, no errors of any kind. Just like on the Linux box.
> 
> Now we would try to search for a term in the document. 0 results
> would be
> returned? To make matters worse if I run a search on a term that
> shows up in
> a bunch of documents on windows it only find 2 results, where in
> Linux it
> would find 50 (same content).
> 
> Using "Luke" I was able to verify that the pdf in question is in the
> index.
> Why can't the searcher find it?
> 
> Any ideas would be welcome.
> 
> Luke
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Weird Behavior On Windows

Posted by Luke Shannon <ls...@hypermedia.com>.

Hello All;

Things have been running smoothly on Linux for sometime. We set up a version
of the site on a Win2K machine, this is when all the "fun" started.

A pdf would be added to the system. The indexer would run, find the new
file, index it and successfully complete the update of the index folder. No
IO error, no errors of any kind. Just like on the Linux box.

Now we would try to search for a term in the document. 0 results would be
returned? To make matters worse if I run a search on a term that shows up in
a bunch of documents on windows it only find 2 results, where in Linux it
would find 50 (same content).

Using "Luke" I was able to verify that the pdf in question is in the index.
Why can't the searcher find it?

Any ideas would be welcome.

Luke



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: QueryFilter vs CachingWrapperFilter vs RangeQuery

Posted by Otis Gospodnetic <ot...@yahoo.com>.

If you run the same query again, the IndexSearcher will go all the way
to the index again - no caching.  Some caching will be done by your
file system, possibly, but that's it.  Lucene is fast, so don't
optimize early.

Otis


--- Ben Rooney <be...@blastradius.com> wrote:

> thanks chris,
> 
> you are correct that i'm not sure if i need the caching ability.  it
> is
> more to understand right now so that if we do need to implement it, i
> am
> able to.
> 
> the reason for the caching is that we will have listing pages for
> certain content types.  for example a listing page of articles.  this
> listing will be generated against lucene engine using a basic query.
> the page will also have the ability to filter the articles based on
> date
> range as one example.  so caching those results could be beneficial.
> 
> however, we will also potentially want to cache the basic query so
> that
> subsequent queries will hit a cache.  when new content is published
> or
> content is removed from the site, the caches will need to be
> invalidated
> so new results are created.
> 
> for the basic query, is there any caching mechanism built into the
> SearchIndexer or do we need to build our own caching mechanism?
> 
> thanks
> ben
> 
> On Tue, 2004-07-12 at 12:29 -0800, Chris Hostetter wrote:
> 
> > : > executes the search, i would keep a static reference to
> SearchIndexer
> > : > and then when i want to invalidate the cache, set it to null or
> create
> > 
> > : design of your system.  But, yes, you do need to keep a reference
> to it
> > : for the cache to work properly.  If you use a new IndexSearcher
> > : instance (I'm simplifying here, you could have an IndexReader
> instance
> > : yourself too, but I'm ignoring that possibility) then the
> filtering
> > : process occurs for each search rather than using the cache.
> > 
> > Assuming you have a finite number of Filters, and assuming those
> Filters
> > are expensive enough to be worth it...
> > 
> > Another approach you can take to "share" the cache among multiple
> > IndexReaders is to explicitly call the bits method on your
> filter(s) once,
> > and then cache the resulting BitSet anywhere you want (ie:
> serialize it to
> > disk if you so choose).  and then impliment a "BitsFilter" class
> that you
> > can construct directly from a BitSet regardless of the IndexReader.
>  The
> > down side of this approach is that it will *ONLY* work if you
> arecertain
> > that the index is never being modified.  If any documents get
> added, or
> > the index gets re-optimized you must regenerate all of the BitSets.
> > 
> > (That's why the CachingWrapperFilter's cache is keyed off of hte
> > IndexReader ... as long as you're re-using the same IndexReader, it
> know's
> > that the cached BitSet must still be valid, because an IndexReader
> > allways sees the same index as when it was opened, even if another
> > thread/process modifies it.)
> > 
> > 
> > 	class BitsFilter {
> >            BitSet bits;
> >            public BitsFilter(BitSet bits) {
> >              this.bits=bits;
> >            }
> >            public BitSet bigs(IndexReader r) {
> >              return bits.clone();
> >            }
> >         }
> > 
> > 
> > 
> > 
> > -Hoss
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: QueryFilter vs CachingWrapperFilter vs RangeQuery

Posted by Ben Rooney <be...@blastradius.com>.

thanks chris,

you are correct that i'm not sure if i need the caching ability.  it is
more to understand right now so that if we do need to implement it, i am
able to.

the reason for the caching is that we will have listing pages for
certain content types.  for example a listing page of articles.  this
listing will be generated against lucene engine using a basic query.
the page will also have the ability to filter the articles based on date
range as one example.  so caching those results could be beneficial.

however, we will also potentially want to cache the basic query so that
subsequent queries will hit a cache.  when new content is published or
content is removed from the site, the caches will need to be invalidated
so new results are created.

for the basic query, is there any caching mechanism built into the
SearchIndexer or do we need to build our own caching mechanism?

thanks
ben

On Tue, 2004-07-12 at 12:29 -0800, Chris Hostetter wrote:

> : > executes the search, i would keep a static reference to SearchIndexer
> : > and then when i want to invalidate the cache, set it to null or create
> 
> : design of your system.  But, yes, you do need to keep a reference to it
> : for the cache to work properly.  If you use a new IndexSearcher
> : instance (I'm simplifying here, you could have an IndexReader instance
> : yourself too, but I'm ignoring that possibility) then the filtering
> : process occurs for each search rather than using the cache.
> 
> Assuming you have a finite number of Filters, and assuming those Filters
> are expensive enough to be worth it...
> 
> Another approach you can take to "share" the cache among multiple
> IndexReaders is to explicitly call the bits method on your filter(s) once,
> and then cache the resulting BitSet anywhere you want (ie: serialize it to
> disk if you so choose).  and then impliment a "BitsFilter" class that you
> can construct directly from a BitSet regardless of the IndexReader.  The
> down side of this approach is that it will *ONLY* work if you arecertain
> that the index is never being modified.  If any documents get added, or
> the index gets re-optimized you must regenerate all of the BitSets.
> 
> (That's why the CachingWrapperFilter's cache is keyed off of hte
> IndexReader ... as long as you're re-using the same IndexReader, it know's
> that the cached BitSet must still be valid, because an IndexReader
> allways sees the same index as when it was opened, even if another
> thread/process modifies it.)
> 
> 
> 	class BitsFilter {
>            BitSet bits;
>            public BitsFilter(BitSet bits) {
>              this.bits=bits;
>            }
>            public BitSet bigs(IndexReader r) {
>              return bits.clone();
>            }
>         }
> 
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Re: Read locks on indexes

Posted by Luke Shannon <ls...@hypermedia.com>.

I think the read locks are preventing you from deleting from the index with
your reader and writing to the index with a writer at the same time.

If you never use a writer than I guess you don't need to worry about this.

But how do you create the indexes?

Luke

----- Original Message ----- 
From: "Shawn Konopinsky" <sk...@blueprint.org>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Tuesday, December 07, 2004 4:17 PM
Subject: Read locks on indexes


> Hi,
>
> I have a question regarding read locks on indexes. I have the situation
> where I have n applications (separated jvms) running queries. These
> applications are read-only, and never use an IndexWriter.
>
> The index is only ever updated using rsync. The applications don't need
> up the minute updates, only the data from when the reader was created is
> fine.
>
> My question is whether it's ok to disable read locks in this scenario?
> What are read locks protecting?
>
> Best,
> Shawn.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Read locks on indexes

Posted by Shawn Konopinsky <sk...@blueprint.org>.

Hi,

I have a question regarding read locks on indexes. I have the situation 
where I have n applications (separated jvms) running queries. These 
applications are read-only, and never use an IndexWriter.

The index is only ever updated using rsync. The applications don't need 
up the minute updates, only the data from when the reader was created is 
fine.

My question is whether it's ok to disable read locks in this scenario? 
What are read locks protecting?

Best,
Shawn.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: QueryFilter vs CachingWrapperFilter vs RangeQuery

Posted by Chris Hostetter <ho...@fucit.org>.

: > executes the search, i would keep a static reference to SearchIndexer
: > and then when i want to invalidate the cache, set it to null or create

: design of your system.  But, yes, you do need to keep a reference to it
: for the cache to work properly.  If you use a new IndexSearcher
: instance (I'm simplifying here, you could have an IndexReader instance
: yourself too, but I'm ignoring that possibility) then the filtering
: process occurs for each search rather than using the cache.

Assuming you have a finite number of Filters, and assuming those Filters
are expensive enough to be worth it...

Another approach you can take to "share" the cache among multiple
IndexReaders is to explicitly call the bits method on your filter(s) once,
and then cache the resulting BitSet anywhere you want (ie: serialize it to
disk if you so choose).  and then impliment a "BitsFilter" class that you
can construct directly from a BitSet regardless of the IndexReader.  The
down side of this approach is that it will *ONLY* work if you arecertain
that the index is never being modified.  If any documents get added, or
the index gets re-optimized you must regenerate all of the BitSets.

(That's why the CachingWrapperFilter's cache is keyed off of hte
IndexReader ... as long as you're re-using the same IndexReader, it know's
that the cached BitSet must still be valid, because an IndexReader
allways sees the same index as when it was opened, even if another
thread/process modifies it.)


	class BitsFilter {
           BitSet bits;
           public BitsFilter(BitSet bits) {
             this.bits=bits;
           }
           public BitSet bigs(IndexReader r) {
             return bits.clone();
           }
        }




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: QueryFilter vs CachingWrapperFilter vs RangeQuery

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 7, 2004, at 3:06 PM, Ben Rooney wrote:
> i'm trying to understand the difference/effects between QueryFilter vs
> CachingWrapperFilter and when you would use one vs the other and how
> they work exactly.

QueryFilter caches the results (bit set of documents) of a query by 
IndexReader.

CachingWrapperFilter does not actually do any filtering of its own, but 
merely wraps the results of another non-caching filter, such as 
DateFilter.  CachingWrapperFilter was added to disconnect caching from 
filtering.  QueryFilter is the exception as it came first and already 
does caching.  If you're using QueryFilter, there is no need to concern 
yourself with CachingWrapperFilter.

> also, when exactly will the cache be cleared.  looking at the source
> code, it appears when the IndexReader is released it would be cleared.
> does this mean i should keep a reference to the SearchIndexer until i
> want the results to be cleared?  for example, in a class file the
> executes the search, i would keep a static reference to SearchIndexer
> and then when i want to invalidate the cache, set it to null or create 
> a
> new instance of it?

How you keep a reference to the IndexSearcher instance is up to the 
design of your system.  But, yes, you do need to keep a reference to it 
for the cache to work properly.  If you use a new IndexSearcher 
instance (I'm simplifying here, you could have an IndexReader instance 
yourself too, but I'm ignoring that possibility) then the filtering 
process occurs for each search rather than using the cache.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org