You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Karsten Konrad <Ka...@xtramind.com> on 2003/10/10 17:15:26 UTC

AW: file handle changes

Hi,

I just downloaded the nightly build zip and am missing these patches
(like a a lot of other Lucene users, probably). I am not using the 
CVS, so how can I apply the file handle changes? 

Regards,

Karsten

-----Ursprüngliche Nachricht-----
Von: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net] 
Gesendet: Dienstag, 23. September 2003 01:16
An: Lucene Developers List
Betreff: Re: file handle changes


Greetings again.

I've implemented the file handle reduction changes, roughly as proposed 
before. Here are the patches for your enjoyment! :)

------------------------------------------
SUMMARY:
The goal of this patch is to drastically reduce the number of file 
handles required by Lucene. This is achieved by reducing the number of 
files required by a single index segment from N to 1, where N depends on 
the number of indexed fields in the segment. Typically, one should see a 
drop in the number of file handles by an order of magnitude! It could 
even be greater for indexes that contain large numbers of indexed fields.

The best part is that to take advantage of this feature, one simply 
needs to call setUseCompoundFiles(true) on an IndexWriter before putting 
documents into it. Everything else is automatic!

------------------------------------------
DETAILS:
The proposed implementation adds a new property to the IndexWriter -- 
get/setUseCompoundFiles(boolean). This property defaults to false, which 
is the existing behavior prior to this patch. If the property is set to 
true, all segments created by this IndexWriter will be of the "compound 
file" format. Compound file segments have only one main file - <id>.cfs. 
Document deletions are handled as before -- if documents from this 
segment are deleted, a second file named <id>.del is created (I didn't 
change this code).

The get/setUseCompoundFiles setting can be changed at any time during 
the existance of the IndexWriter and takes effect during the next time 
the IndexWriter merges segments in its target directory. 
SegmentIndexReader can now work with either type of segment.

This change does not affect how the segments are handled in the 
temporary RAMDirectory used by the IndexWriter internally, only the 
final segments written to the target directory. Also, a given directory 
can contain both types of segments and everything works out automagically.

-----------------------------------------
I have also created a new JUnit test case to test these features, which 
runs successfully. For the moment it creates files off of the current 
working directory in which the junit is executed. I also converted some 
of the older tests "XXXTest" into "TestXXX", and made sure they work 
with the old implementation and the new one. These tests do not yet do 
enough assert(...) calls, but they now execute twice: with the 
multi-file indexes and the new compound file indexes, and assert that 
the output is the same. The old files are still there, I just added new 
ones with the inverted names. In one case - ThreadSafetyTest.java - I 
actually made changes to that file because I thougt this test was too 
long to run as an automatic test in JUnit. Build.xml required a small 
change to add a class from the src/demo tree to the classpath.

----------------------------------------
Doug, I've really considered keeping everything at the Directory level, 
as you suggested. This would have been preferred, I agree, but I really 
couldn't find a way to reconsile this approach with the other two goals 
I had: (a) keep specific file extension knowledge at the lucene.index.* 
level where it is now, and (b) avoid having to support writes to the 
compound file.

----------------------------------------
I'm attaching the patches against the current Lucene CVS source 
(basically output of "cvs diff -Buw"). The files listed as "?" are new 
files and are also attached.

(BTW, there are currently two failures in the existing JUnit test cases, 
but they occur with or without these patches, as has already been noted 
by Otis, Doug and Eric).

Finally, I should theoretically have commit access to Lucene's CVS, but 
I've never tried using it yet. If these changes seem ok, I could commit 
them myself (provided I can find my password, etc., etc.).

Enjoy.
Dmitry.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: AW: file handle changes

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Assuming you got a compatible nightly build, you should be able to take
the attachment that Dmitry sent and use 'patch' command to apply
Dmitry's patches.  You will also need to add some new classes to the
source tree.  Then you can recompile, jar, etc.

Otis

--- Karsten Konrad <Ka...@xtramind.com> wrote:
> 
> Hi,
> 
> I just downloaded the nightly build zip and am missing these patches
> (like a a lot of other Lucene users, probably). I am not using the 
> CVS, so how can I apply the file handle changes? 
> 
> Regards,
> 
> Karsten
> 
> -----Urspr�ngliche Nachricht-----
> Von: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net] 
> Gesendet: Dienstag, 23. September 2003 01:16
> An: Lucene Developers List
> Betreff: Re: file handle changes
> 
> 
> Greetings again.
> 
> I've implemented the file handle reduction changes, roughly as
> proposed 
> before. Here are the patches for your enjoyment! :)
> 
> ------------------------------------------
> SUMMARY:
> The goal of this patch is to drastically reduce the number of file 
> handles required by Lucene. This is achieved by reducing the number
> of 
> files required by a single index segment from N to 1, where N depends
> on 
> the number of indexed fields in the segment. Typically, one should
> see a 
> drop in the number of file handles by an order of magnitude! It could
> 
> even be greater for indexes that contain large numbers of indexed
> fields.
> 
> The best part is that to take advantage of this feature, one simply 
> needs to call setUseCompoundFiles(true) on an IndexWriter before
> putting 
> documents into it. Everything else is automatic!
> 
> ------------------------------------------
> DETAILS:
> The proposed implementation adds a new property to the IndexWriter --
> 
> get/setUseCompoundFiles(boolean). This property defaults to false,
> which 
> is the existing behavior prior to this patch. If the property is set
> to 
> true, all segments created by this IndexWriter will be of the
> "compound 
> file" format. Compound file segments have only one main file -
> <id>.cfs. 
> Document deletions are handled as before -- if documents from this 
> segment are deleted, a second file named <id>.del is created (I
> didn't 
> change this code).
> 
> The get/setUseCompoundFiles setting can be changed at any time during
> 
> the existance of the IndexWriter and takes effect during the next
> time 
> the IndexWriter merges segments in its target directory. 
> SegmentIndexReader can now work with either type of segment.
> 
> This change does not affect how the segments are handled in the 
> temporary RAMDirectory used by the IndexWriter internally, only the 
> final segments written to the target directory. Also, a given
> directory 
> can contain both types of segments and everything works out
> automagically.
> 
> -----------------------------------------
> I have also created a new JUnit test case to test these features,
> which 
> runs successfully. For the moment it creates files off of the current
> 
> working directory in which the junit is executed. I also converted
> some 
> of the older tests "XXXTest" into "TestXXX", and made sure they work 
> with the old implementation and the new one. These tests do not yet
> do 
> enough assert(...) calls, but they now execute twice: with the 
> multi-file indexes and the new compound file indexes, and assert that
> 
> the output is the same. The old files are still there, I just added
> new 
> ones with the inverted names. In one case - ThreadSafetyTest.java - I
> 
> actually made changes to that file because I thougt this test was too
> 
> long to run as an automatic test in JUnit. Build.xml required a small
> 
> change to add a class from the src/demo tree to the classpath.
> 
> ----------------------------------------
> Doug, I've really considered keeping everything at the Directory
> level, 
> as you suggested. This would have been preferred, I agree, but I
> really 
> couldn't find a way to reconsile this approach with the other two
> goals 
> I had: (a) keep specific file extension knowledge at the
> lucene.index.* 
> level where it is now, and (b) avoid having to support writes to the 
> compound file.
> 
> ----------------------------------------
> I'm attaching the patches against the current Lucene CVS source 
> (basically output of "cvs diff -Buw"). The files listed as "?" are
> new 
> files and are also attached.
> 
> (BTW, there are currently two failures in the existing JUnit test
> cases, 
> but they occur with or without these patches, as has already been
> noted 
> by Otis, Doug and Eric).
> 
> Finally, I should theoretically have commit access to Lucene's CVS,
> but 
> I've never tried using it yet. If these changes seem ok, I could
> commit 
> them myself (provided I can find my password, etc., etc.).
> 
> Enjoy.
> Dmitry.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org