You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Avi Drissman <av...@baseview.com> on 2003/03/19 17:43:26 UTC

Putting the Lucene index into a database

I've successfully used Lucene to do indexing of about 50-100K files, 
and have been keeping the index on a local disk. It's time to move 
up, and now I'm planning to index from 100-500K files.

I'm trying to decide whether or not it pays to hold the index in our 
database. Our database (FrontBase) has decent blob support, and a 
~300 meg index likely wouldn't faze it, but I have some concerns.

First, I'm looking at Directory, and there are two functions:
* OutputStream createFile(String name)
* InputStream openFile(String name)

How much of the streams do they take advantage of? Does Lucene seek 
around? I'm concerned about huge re-writing of files.

Second is speed. I was looking at SQLDirectory, and although I'd 
probably write my own (inspired by that), who's using it? How is the 
speed compared to flat-files?

Third is replication. We're aiming for a replicated environment. If 
we wanted to build the index on the disk rather than in the database, 
every server would have to keep their own copy. Does anyone have any 
experience in this?

Thanks.

Avi

-- 
Avi 'rlwimi' Drissman
avi@baseview.com
Argh! This darn mail server is trunca

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Putting the Lucene index into a database

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Adding a document does not necessarily cause existing index files to be
modified.  'not necessarily' because sometimes adding a document
triggers segment merging.  There is a recent (March 5th) article on
http://onjava.com about Lucene that talks more about that.

Otis

--- Avi Drissman <av...@baseview.com> wrote:
> At 10:54 AM -0800 3/19/03, you wrote:
> 
> >If your data will be changing frequently and indices all of them
> need
> >to be in sync all the time then yes, probably, esp. if the changes
> are
> >frequent but small.
> 
> Hmm...
> 
> I think I need to rephrase my first question. Suppose I have a big 
> index. I add a document. What seems to me to happen is that the files
> 
> are read into memory, manipulated, and then written back out to disk 
> as new files. What doesn't appear to be happening is modification of 
> the files in-place. Is that true?
> 
> If so, I think that negatively impacts my ability to store the 
> indexes in the database...
> 
> Avi
> -- 
> Avi 'rlwimi' Drissman
> avi@baseview.com
> Argh! This darn mail server is trunca
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Putting the Lucene index into a database

Posted by Avi Drissman <av...@baseview.com>.
At 10:54 AM -0800 3/19/03, you wrote:

>If your data will be changing frequently and indices all of them need
>to be in sync all the time then yes, probably, esp. if the changes are
>frequent but small.

Hmm...

I think I need to rephrase my first question. Suppose I have a big 
index. I add a document. What seems to me to happen is that the files 
are read into memory, manipulated, and then written back out to disk 
as new files. What doesn't appear to be happening is modification of 
the files in-place. Is that true?

If so, I think that negatively impacts my ability to store the 
indexes in the database...

Avi
-- 
Avi 'rlwimi' Drissman
avi@baseview.com
Argh! This darn mail server is trunca

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Putting the Lucene index into a database

Posted by Otis Gospodnetic <ot...@yahoo.com>.
--- Avi Drissman <av...@baseview.com> wrote:
> At 10:25 AM -0800 3/19/03, you wrote:
> 
> >Haven't used it.  Reported speed (by the author) was poor.
> 
> Hm. Is that due to the implementation or possibly to the database?

Not sure.  The author may know.

> >I've done that.  I simply used scp to copy the index from the build
> >machine to a set of maybe dozen servers.
> 
> Well, this data is going to be changing. I'd imagine that every 
> machine in the cluster does its own index maintenance. It's easier to
> 
> send a message such as "add document 5" around to each machine than 
> to shove a ~300mb index around.

If your data will be changing frequently and indices all of them need
to be in sync all the time then yes, probably, esp. if the changes are
frequent but small.

Otis


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Putting the Lucene index into a database

Posted by Avi Drissman <av...@baseview.com>.
At 10:25 AM -0800 3/19/03, you wrote:

>Haven't used it.  Reported speed (by the author) was poor.

Hm. Is that due to the implementation or possibly to the database?

>I've done that.  I simply used scp to copy the index from the build
>machine to a set of maybe dozen servers.

Well, this data is going to be changing. I'd imagine that every 
machine in the cluster does its own index maintenance. It's easier to 
send a message such as "add document 5" around to each machine than 
to shove a ~300mb index around.

Avi
-- 
Avi 'rlwimi' Drissman
avi@baseview.com
Argh! This darn mail server is trunca

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Putting the Lucene index into a database

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Avi,

--- Avi Drissman <av...@baseview.com> wrote:
> I've successfully used Lucene to do indexing of about 50-100K files, 
> and have been keeping the index on a local disk. It's time to move 
> up, and now I'm planning to index from 100-500K files.
> 
> I'm trying to decide whether or not it pays to hold the index in our 
> database. Our database (FrontBase) has decent blob support, and a 
> ~300 meg index likely wouldn't faze it, but I have some concerns.
> 
> First, I'm looking at Directory, and there are two functions:
> * OutputStream createFile(String name)
> * InputStream openFile(String name)
> 
> How much of the streams do they take advantage of? Does Lucene seek 
> around? I'm concerned about huge re-writing of files.
> 
> Second is speed. I was looking at SQLDirectory, and although I'd 
> probably write my own (inspired by that), who's using it? How is the 
> speed compared to flat-files?

Haven't used it.  Reported speed (by the author) was poor.

> Third is replication. We're aiming for a replicated environment. If 
> we wanted to build the index on the disk rather than in the database,
> 
> every server would have to keep their own copy. Does anyone have any 
> experience in this?

I've done that.  I simply used scp to copy the index from the build
machine to a set of maybe dozen servers.
You probably don't want to copy directly into the final destination
directory, but rather a temp directory first, and then rename/move to
the target directory (atomic and quick, esp. if on the same disk, as
opposed to slow copy over the network).

Otis


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org