You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Maik Schreiber <bZ...@iq-computing.de> on 2001/10/08 04:04:02 UTC

CachingDirectory contribution

A while back I wrote a CachingDirectory implementation for Lucene which
allows for caching an index on a local machine other than the "root"
machine. This can be very useful for handling heavy load (such as David
Snyder's 13 GB index :-))

I'd really love to see it included in the Lucene package (I'm okay with
it being put under the Apache license). If there's some interest I could
provide the sources to the abstract CachingDirectory as well as actual
FSCachingDirectory and RAMCachingDirectory implementations.

All these classes compile fine with Lucene 1.2rc1, but I'd like to
further test them before I provide the sources.


In addition to that I could also provide my OracleDirectory
implementation which stores all index files into an Oracle database
instead of a file system structure. I haven't done a SQLServerDirectory,
but I'm willing to implement it as well :)

Any comments?

-- 
Maik Schreiber
IQ Computing - http://www.iq-computing.de
mailto: info@iq-computing.de


Re: CachingDirectory contribution

Posted by Otis Gospodnetic <ot...@yahoo.com>.
> This leads me to yet another of my buring questions..
> has anyone pushed Lucene to its limits yet? If so,
> what are they? What happens when Lucene hit its limit?
> Does it throw an exception? coredump? 

I haven't done that, but I've been to one job interview a few months
ago in which I mentioned Lucene, after hearing that they were using
Verity.
This company had supposedly tried using Lucene, but according to them
at some point they had hit the wall with it and the performance just
dropped.
I do not know how large their indices were, but I suspect they were
quite large. This company indexes news articles, press releases, et
cetera, and keeps them around for a while (a few months or so).

Otis


__________________________________________________
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

Re: CachingDirectory contribution

Posted by Dave Kor <da...@yahoo.com>.
> A while back I wrote a CachingDirectory
> implementation for Lucene which
> allows for caching an index on a local machine other
> than the "root"
> machine. This can be very useful for handling heavy
> load (such as David
> Snyder's 13 GB index :-))

13GB is considered a light load for Lucene. I am
currently running a lucene demo on my old but trusty
pentium-120mhz laptop with a 9GB index. It takes
Lucene about 20 seconds to handle the very first
query, probably because it is loading the index into
memory. All subsequent queries are instantaneous. 

Anyway, I'm very curious as to how your directory
caching code works. What does it cache? files?
previously read data? Have you measured the amount of
performance improvement that was gained using your
caching system? What was the index size you used to
measure performance improvement?

If it does improve performance for huge indexes, I'll
+1. 

This leads me to yet another of my buring questions..
has anyone pushed Lucene to its limits yet? If so,
what are they? What happens when Lucene hit its limit?
Does it throw an exception? coredump? 


> In addition to that I could also provide my
> OracleDirectory
> implementation which stores all index files into an
> Oracle database
> instead of a file system structure. I haven't done a
> SQLServerDirectory,
> but I'm willing to implement it as well :)

I assume you're using BLOBs to store the index files? 
What are the advantages of using the Oracle directory
over just using the file system?




__________________________________________________
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1