You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Christoph Kiehl <ki...@subshell.com> on 2006/09/01 16:17:01 UTC

Re: Jackrabbits own FileSystem and unit tests

Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> I like the idea of having a transactional index but I don't think it's 
>> a good idea to read this index from a binary property in a database, 
>> because in our case we've got a fairly large repository where we got 
>> index files with a size of 40MB. As far as I understand you have to 
>> transfer 40MB to the database on every index change that gets 
>> committed. Am I right?
> 
> In general, this is correct. but lucene is designed in a way that it 
> never modifies an existing index file. if you have a 40 MB index segment 
> file and you delete a document within that index, lucene will simply 
> update a small other file which is kept along the index called 
> <segment-name>.del. Adding a new document to an existing index segment 
> is not possible, in that case lucene will create a new segment.

Ok. To get this working, you have to create at least one segment per 
transaction, right? And index merging could be done in background? Sounds really 
interesting. But if the blob values are cached locally they have to be 
downloaded on startup first before the index starts to be fast. Or does the blob 
cache survive restarts? Lots of questions ;)

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

sorry for the late response but I was on paternity leave...

Christoph Kiehl wrote:
> But isn't it necessary for the index data to be committed to the 
> database/pm to get a transactional index? I mean if you commit the index 
> changes from the redo.log in a new transaction you don't really gain 
> anything compared to the current solution regarding transactional index 
> behavior, do you?

I expect there will be a performance gain. rather than committing the 
inverted index, which possibly requires segment optimizations in 
lucene, simply appending to a log is much faster.

regards
  marcel

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> Ok. To get this working, you have to create at least one segment per 
>> transaction, right?
> 
> not necessarily. as an optimization the current implementation uses the 
> redo.log to keep track of index modifications that were only done in 
> memory. this means that at the end of a transaction there won't 
> necessarily be a new index segment on disk.

But isn't it necessary for the index data to be committed to the database/pm to 
get a transactional index? I mean if you commit the index changes from the 
redo.log in a new transaction you don't really gain anything compared to the 
current solution regarding transactional index behavior, do you?

>> And index merging could be done in background?
> index merging *is* already done in the background.

Yes, of course.

>> Sounds really interesting. But if the blob values are cached locally 
>> they have to be downloaded on startup first before the index starts to 
>> be fast.
> 
> correct.

Hm, for our case this would mean to download about 10GB on each restart :( Might 
take a while ;)

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

lots of answers...

Christoph Kiehl wrote:
> Ok. To get this working, you have to create at least one segment per 
> transaction, right?

not necessarily. as an optimization the current implementation uses 
the redo.log to keep track of index modifications that were only done 
in memory. this means that at the end of a transaction there won't 
necessarily be a new index segment on disk.

> And index merging could be done in background?

index merging *is* already done in the background.

> Sounds really interesting. But if the blob values are cached locally 
> they have to be downloaded on startup first before the index starts to 
> be fast.

correct.

> Or does the blob cache survive restarts?

no, it doesn't.

regards
  marcel