You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Christoph Kiehl <ki...@subshell.com> on 2006/09/01 16:17:01 UTC
Re: Jackrabbits own FileSystem and unit tests
Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> I like the idea of having a transactional index but I don't think it's
>> a good idea to read this index from a binary property in a database,
>> because in our case we've got a fairly large repository where we got
>> index files with a size of 40MB. As far as I understand you have to
>> transfer 40MB to the database on every index change that gets
>> committed. Am I right?
>
> In general, this is correct. but lucene is designed in a way that it
> never modifies an existing index file. if you have a 40 MB index segment
> file and you delete a document within that index, lucene will simply
> update a small other file which is kept along the index called
> <segment-name>.del. Adding a new document to an existing index segment
> is not possible, in that case lucene will create a new segment.
Ok. To get this working, you have to create at least one segment per
transaction, right? And index merging could be done in background? Sounds really
interesting. But if the blob values are cached locally they have to be
downloaded on startup first before the index starts to be fast. Or does the blob
cache survive restarts? Lots of questions ;)
Cheers,
Christoph
Re: Jackrabbits own FileSystem and unit tests
Posted by Marcel Reutegger <ma...@gmx.net>.
sorry for the late response but I was on paternity leave...
Christoph Kiehl wrote:
> But isn't it necessary for the index data to be committed to the
> database/pm to get a transactional index? I mean if you commit the index
> changes from the redo.log in a new transaction you don't really gain
> anything compared to the current solution regarding transactional index
> behavior, do you?
I expect there will be a performance gain. rather than committing the
inverted index, which possibly requires segment optimizations in
lucene, simply appending to a log is much faster.
regards
marcel
Re: Jackrabbits own FileSystem and unit tests
Posted by Christoph Kiehl <ki...@subshell.com>.
Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> Ok. To get this working, you have to create at least one segment per
>> transaction, right?
>
> not necessarily. as an optimization the current implementation uses the
> redo.log to keep track of index modifications that were only done in
> memory. this means that at the end of a transaction there won't
> necessarily be a new index segment on disk.
But isn't it necessary for the index data to be committed to the database/pm to
get a transactional index? I mean if you commit the index changes from the
redo.log in a new transaction you don't really gain anything compared to the
current solution regarding transactional index behavior, do you?
>> And index merging could be done in background?
> index merging *is* already done in the background.
Yes, of course.
>> Sounds really interesting. But if the blob values are cached locally
>> they have to be downloaded on startup first before the index starts to
>> be fast.
>
> correct.
Hm, for our case this would mean to download about 10GB on each restart :( Might
take a while ;)
Cheers,
Christoph
Re: Jackrabbits own FileSystem and unit tests
Posted by Marcel Reutegger <ma...@gmx.net>.
lots of answers...
Christoph Kiehl wrote:
> Ok. To get this working, you have to create at least one segment per
> transaction, right?
not necessarily. as an optimization the current implementation uses
the redo.log to keep track of index modifications that were only done
in memory. this means that at the end of a transaction there won't
necessarily be a new index segment on disk.
> And index merging could be done in background?
index merging *is* already done in the background.
> Sounds really interesting. But if the blob values are cached locally
> they have to be downloaded on startup first before the index starts to
> be fast.
correct.
> Or does the blob cache survive restarts?
no, it doesn't.
regards
marcel