You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Doug Cutting <cu...@apache.org> on 2004/03/08 21:25:15 UTC

compound format as default in 1.4?

[ I moved this discussion to the developer list.]

My metric here is the rate of complaint.

I'm tired of hearing about "too many file handles" problems.  Ususally 
it is caused by folks opening a new searcher for each query, and the 
garbage collector not collecting and closing the old ones fast enough, 
so it signals other problems with the application, but it is still 
annoying, and could be largely quashed.

By some definition, anything which causes so many repeated complaints is 
a bug, and should be fixed.  Even if it's really not a bug.  It pains 
users of Lucene.  It annoys developers of Lucene.

Think of it like mergeFactor, etc.: the default setting may not be the 
absolute fastest, but it is one that is likely to run well in most 
configurations and cause the least confusion.

Doug

Terry Steichen wrote:
> I tend to agree (but with the same uncertainty as to why I feel that way).
> 
> Regards,
> 
> Terry
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <ot...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Monday, March 08, 2004 2:34 PM
> Subject: Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
> 
> 
> 
>>I can't explain why, but I feel like the old index format should stay
>>by default.  I feel like I'd rather a (slightly) faster index, and
>>switch to the compound one when/IF I encounter problems, than have a
>>safer, but slower index, and never realize that there is a faster
>>option available.
>>
>>Weak argument, I know, but some instinct in me thinks that the current
>>mode should remain.
>>
>>Otis
>>
>>
>>--- Doug Cutting <cu...@apache.org> wrote:
>>
>>>hui wrote:
>>>
>>>>Index time: 
>>>>compound format is 89 seconds slower.
>>>>
>>>>compound format:
>>>>1389507 total milliseconds
>>>>non-compound format:
>>>>1300534 total milliseconds
>>>>
>>>>The index size is 85m with 4 fields only. The files are stored in
>>>
>>>the index.
>>>
>>>>The compound format has only 3 files and the other has 13 files. 
>>>
>>>Thanks for performing this benchmark!
>>>
>>>It looks like the compound format is around 7% slower when indexing. 
>>>To 
>>>my thinking that's acceptable, given the dramatic reduction in file 
>>>handles.  If folks really need maximal indexing performance, then
>>>they 
>>>can explicitly disable the compound format.
>>>
>>>Would anyone object to making compound format the default for Lucene 
>>>1.4?  This is an incompatible change, but I don't think it should
>>>break 
>>>applications.
>>>
>>>Doug
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: compound format as default in 1.4?

Posted by Scott ganyo <sc...@ganyo.com>.
+1.  I agree with this.  Give the safest option to the general masses,  
let the "expert" users choose other options based on their level of  
experience.

(BTW:  It seems that accessing this compound file format as a memory  
mapped file using the NIO library would be a natural fit for improving  
Lucene's memory footprint as well...)

Scott

On Mar 8, 2004, at 3:25 PM, Doug Cutting wrote:

> [ I moved this discussion to the developer list.]
>
> My metric here is the rate of complaint.
>
> I'm tired of hearing about "too many file handles" problems.  Ususally  
> it is caused by folks opening a new searcher for each query, and the  
> garbage collector not collecting and closing the old ones fast enough,  
> so it signals other problems with the application, but it is still  
> annoying, and could be largely quashed.
>
> By some definition, anything which causes so many repeated complaints  
> is a bug, and should be fixed.  Even if it's really not a bug.  It  
> pains users of Lucene.  It annoys developers of Lucene.
>
> Think of it like mergeFactor, etc.: the default setting may not be the  
> absolute fastest, but it is one that is likely to run well in most  
> configurations and cause the least confusion.
>
> Doug
>
> Terry Steichen wrote:
>> I tend to agree (but with the same uncertainty as to why I feel that  
>> way).
>> Regards,
>> Terry
>> ----- Original Message ----- From: "Otis Gospodnetic"  
>> <ot...@yahoo.com>
>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>> Sent: Monday, March 08, 2004 2:34 PM
>> Subject: Re: Sys properties Was: java.io.tmpdir as lock dir .... once  
>> again
>>> I can't explain why, but I feel like the old index format should stay
>>> by default.  I feel like I'd rather a (slightly) faster index, and
>>> switch to the compound one when/IF I encounter problems, than have a
>>> safer, but slower index, and never realize that there is a faster
>>> option available.
>>>
>>> Weak argument, I know, but some instinct in me thinks that the  
>>> current
>>> mode should remain.
>>>
>>> Otis
>>>
>>>
>>> --- Doug Cutting <cu...@apache.org> wrote:
>>>
>>>> hui wrote:
>>>>
>>>>> Index time: compound format is 89 seconds slower.
>>>>>
>>>>> compound format:
>>>>> 1389507 total milliseconds
>>>>> non-compound format:
>>>>> 1300534 total milliseconds
>>>>>
>>>>> The index size is 85m with 4 fields only. The files are stored in
>>>>
>>>> the index.
>>>>
>>>>> The compound format has only 3 files and the other has 13 files.
>>>>
>>>> Thanks for performing this benchmark!
>>>>
>>>> It looks like the compound format is around 7% slower when  
>>>> indexing. To my thinking that's acceptable, given the dramatic  
>>>> reduction in file handles.  If folks really need maximal indexing  
>>>> performance, then
>>>> they can explicitly disable the compound format.
>>>>
>>>> Would anyone object to making compound format the default for  
>>>> Lucene 1.4?  This is an incompatible change, but I don't think it  
>>>> should
>>>> break applications.
>>>>
>>>> Doug
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>

Re: compound format as default in 1.4?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Just to weigh in with my opinion... the compound file format proves  
fine in my use of Lucene and I use it 'by default' already.  So I'm +1  
on making it the default behavior.

	Erik


On Mar 8, 2004, at 3:25 PM, Doug Cutting wrote:

> [ I moved this discussion to the developer list.]
>
> My metric here is the rate of complaint.
>
> I'm tired of hearing about "too many file handles" problems.  Ususally  
> it is caused by folks opening a new searcher for each query, and the  
> garbage collector not collecting and closing the old ones fast enough,  
> so it signals other problems with the application, but it is still  
> annoying, and could be largely quashed.
>
> By some definition, anything which causes so many repeated complaints  
> is a bug, and should be fixed.  Even if it's really not a bug.  It  
> pains users of Lucene.  It annoys developers of Lucene.
>
> Think of it like mergeFactor, etc.: the default setting may not be the  
> absolute fastest, but it is one that is likely to run well in most  
> configurations and cause the least confusion.
>
> Doug
>
> Terry Steichen wrote:
>> I tend to agree (but with the same uncertainty as to why I feel that  
>> way).
>> Regards,
>> Terry
>> ----- Original Message ----- From: "Otis Gospodnetic"  
>> <ot...@yahoo.com>
>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>> Sent: Monday, March 08, 2004 2:34 PM
>> Subject: Re: Sys properties Was: java.io.tmpdir as lock dir .... once  
>> again
>>> I can't explain why, but I feel like the old index format should stay
>>> by default.  I feel like I'd rather a (slightly) faster index, and
>>> switch to the compound one when/IF I encounter problems, than have a
>>> safer, but slower index, and never realize that there is a faster
>>> option available.
>>>
>>> Weak argument, I know, but some instinct in me thinks that the  
>>> current
>>> mode should remain.
>>>
>>> Otis
>>>
>>>
>>> --- Doug Cutting <cu...@apache.org> wrote:
>>>
>>>> hui wrote:
>>>>
>>>>> Index time: compound format is 89 seconds slower.
>>>>>
>>>>> compound format:
>>>>> 1389507 total milliseconds
>>>>> non-compound format:
>>>>> 1300534 total milliseconds
>>>>>
>>>>> The index size is 85m with 4 fields only. The files are stored in
>>>>
>>>> the index.
>>>>
>>>>> The compound format has only 3 files and the other has 13 files.
>>>>
>>>> Thanks for performing this benchmark!
>>>>
>>>> It looks like the compound format is around 7% slower when  
>>>> indexing. To my thinking that's acceptable, given the dramatic  
>>>> reduction in file handles.  If folks really need maximal indexing  
>>>> performance, then
>>>> they can explicitly disable the compound format.
>>>>
>>>> Would anyone object to making compound format the default for  
>>>> Lucene 1.4?  This is an incompatible change, but I don't think it  
>>>> should
>>>> break applications.
>>>>
>>>> Doug
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: compound format as default in 1.4?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Eh, fine then! ;)
I am using the compound format in my apps, too.

Otis

--- Doug Cutting <cu...@apache.org> wrote:
> [ I moved this discussion to the developer list.]
> 
> My metric here is the rate of complaint.
> 
> I'm tired of hearing about "too many file handles" problems. 
> Ususally 
> it is caused by folks opening a new searcher for each query, and the 
> garbage collector not collecting and closing the old ones fast
> enough, 
> so it signals other problems with the application, but it is still 
> annoying, and could be largely quashed.
> 
> By some definition, anything which causes so many repeated complaints
> is 
> a bug, and should be fixed.  Even if it's really not a bug.  It pains
> 
> users of Lucene.  It annoys developers of Lucene.
> 
> Think of it like mergeFactor, etc.: the default setting may not be
> the 
> absolute fastest, but it is one that is likely to run well in most 
> configurations and cause the least confusion.
> 
> Doug
> 
> Terry Steichen wrote:
> > I tend to agree (but with the same uncertainty as to why I feel
> that way).
> > 
> > Regards,
> > 
> > Terry
> > ----- Original Message ----- 
> > From: "Otis Gospodnetic" <ot...@yahoo.com>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>
> > Sent: Monday, March 08, 2004 2:34 PM
> > Subject: Re: Sys properties Was: java.io.tmpdir as lock dir ....
> once again
> > 
> > 
> > 
> >>I can't explain why, but I feel like the old index format should
> stay
> >>by default.  I feel like I'd rather a (slightly) faster index, and
> >>switch to the compound one when/IF I encounter problems, than have
> a
> >>safer, but slower index, and never realize that there is a faster
> >>option available.
> >>
> >>Weak argument, I know, but some instinct in me thinks that the
> current
> >>mode should remain.
> >>
> >>Otis
> >>
> >>
> >>--- Doug Cutting <cu...@apache.org> wrote:
> >>
> >>>hui wrote:
> >>>
> >>>>Index time: 
> >>>>compound format is 89 seconds slower.
> >>>>
> >>>>compound format:
> >>>>1389507 total milliseconds
> >>>>non-compound format:
> >>>>1300534 total milliseconds
> >>>>
> >>>>The index size is 85m with 4 fields only. The files are stored in
> >>>
> >>>the index.
> >>>
> >>>>The compound format has only 3 files and the other has 13 files. 
> >>>
> >>>Thanks for performing this benchmark!
> >>>
> >>>It looks like the compound format is around 7% slower when
> indexing. 
> >>>To 
> >>>my thinking that's acceptable, given the dramatic reduction in
> file 
> >>>handles.  If folks really need maximal indexing performance, then
> >>>they 
> >>>can explicitly disable the compound format.
> >>>
> >>>Would anyone object to making compound format the default for
> Lucene 
> >>>1.4?  This is an incompatible change, but I don't think it should
> >>>break 
> >>>applications.
> >>>
> >>>Doug
> >>>
>
>>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>>
> >>
> >>
>
>>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>
> >>
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org