You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2007/07/02 15:35:11 UTC
[VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Hi,
I'd like to commit LUCENE-843.
The patch has gone through a number of iterations but the final
version that's there now (take9) is quite a bit cleaner & simpler than
the ones leading up to it and I believe ready.
It provides solid indexing performance gains (between 2X-8X), but, it
is somewhat more complex than the current "single doc per segment"
approach and it does introduce a change to the index format (only when
autoCommit=false) whereby multiple segments can share a single set of
term vector & stored fields files.
Given that it's such a big change I think (?) it's appropriate to ask
for a vote (only PMC member votes are binding) to make sure we have
consensus that this is net/net a good change for Lucene.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Michael McCandless <lu...@mikemccandless.com>.
Ahh, right, I will update fileformats.xml & re-build html/PDF
(with Forrest 0.8) before committing.
The only downside I have now is if you do flush by RAM (which gives
best performance), you have to be very careful to work around
LUCENE-845 by also setting maxBufferedDocs to be something "around"
the right number. However this downside should go away once we
resolve LUCENE-845 (which is next on my stack, after the "multiple
writers over NFS" that's in progress now!).
I will also plant a tag just before committing.
Thanks for reviewing, everyone! I will give it another day or so and
then commit.
Mike
"Grant Ingersoll" <gs...@apache.org> wrote:
> Mike,
>
> Nice piece of work here. One caveat, I think you mentioned you
> needed to update fileformats.xml (don't forget to generate the site
> and commit those changes too), but I don't see that in the patch.
>
> Also, do you see any downsides to this patch? Do you think it would
> ever be the case that a user would not benefit from it? If so,
> probably would be useful to document them.
>
> Other than that, I am +1
>
> Cheers,
> Grant
>
> On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote:
>
> > Hi,
> >
> > I'd like to commit LUCENE-843.
> >
> > The patch has gone through a number of iterations but the final
> > version that's there now (take9) is quite a bit cleaner & simpler than
> > the ones leading up to it and I believe ready.
> >
> > It provides solid indexing performance gains (between 2X-8X), but, it
> > is somewhat more complex than the current "single doc per segment"
> > approach and it does introduce a change to the index format (only when
> > autoCommit=false) whereby multiple segments can share a single set of
> > term vector & stored fields files.
> >
> > Given that it's such a big change I think (?) it's appropriate to ask
> > for a vote (only PMC member votes are binding) to make sure we have
> > consensus that this is net/net a good change for Lucene.
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Grant Ingersoll <gs...@apache.org>.
Mike,
Nice piece of work here. One caveat, I think you mentioned you
needed to update fileformats.xml (don't forget to generate the site
and commit those changes too), but I don't see that in the patch.
Also, do you see any downsides to this patch? Do you think it would
ever be the case that a user would not benefit from it? If so,
probably would be useful to document them.
Other than that, I am +1
Cheers,
Grant
On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote:
> Hi,
>
> I'd like to commit LUCENE-843.
>
> The patch has gone through a number of iterations but the final
> version that's there now (take9) is quite a bit cleaner & simpler than
> the ones leading up to it and I believe ready.
>
> It provides solid indexing performance gains (between 2X-8X), but, it
> is somewhat more complex than the current "single doc per segment"
> approach and it does introduce a change to the index format (only when
> autoCommit=false) whereby multiple segments can share a single set of
> term vector & stored fields files.
>
> Given that it's such a big change I think (?) it's appropriate to ask
> for a vote (only PMC member votes are binding) to make sure we have
> consensus that this is net/net a good change for Lucene.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Yonik Seeley <yo...@apache.org>.
On 7/2/07, Michael McCandless <lu...@mikemccandless.com> wrote:
> I'd like to commit LUCENE-843.
+1
Awesome job!
> The patch has gone through a number of iterations but the final
> version that's there now (take9) is quite a bit cleaner & simpler than
> the ones leading up to it and I believe ready.
>
> It provides solid indexing performance gains (between 2X-8X), but, it
> is somewhat more complex than the current "single doc per segment"
> approach and it does introduce a change to the index format (only when
> autoCommit=false) whereby multiple segments can share a single set of
> term vector & stored fields files.
I'll miss the elegant single doc approach that's been with us for so
long, but one can't ignore the magnitude of these performance gains.
> Given that it's such a big change I think (?) it's appropriate to ask
> for a vote (only PMC member votes are binding) to make sure we have
> consensus that this is net/net a good change for Lucene.
IMO, there's no need to be that formal. A simple vote on the dev list
(non-committer votes are welcome and carry weight too), and if there's
a consensus then everything is good.
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Grant Ingersoll <gs...@apache.org>.
On Jul 2, 2007, at 4:18 PM, Yonik Seeley wrote:
> On 7/2/07, Grant Ingersoll <gs...@apache.org> wrote:
>> 2. or, at a minimum, do a tag of the trunk right before committing.
>> I just find explicit tags make it easier to rollback or compare diffs
>> if need be
>
> You can always use an explicit revision number, which is easy to find
> out from the bug, or you can even find the closest by time:
>
Yeah, I know you can do that, I just sometimes like explicit tags for
things of this magnitude.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Yonik Seeley <yo...@apache.org>.
On 7/2/07, Grant Ingersoll <gs...@apache.org> wrote:
> 2. or, at a minimum, do a tag of the trunk right before committing.
> I just find explicit tags make it easier to rollback or compare diffs
> if need be
You can always use an explicit revision number, which is easy to find
out from the bug, or you can even find the closest by time:
svn info -r {2006-11-10T00:03:00Z} http://svn.apache.org/repos/asf
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Grant Ingersoll <gs...@apache.org>.
Also, is it worth considering a couple of things:
1. Do a build version release prior to committing (i.e. 2.2.1) that
way we could isolate this change and do a separate release to 2.3. I
don't want to do releases just for the sake of releases, but I think
we should at least prepare people that the next release (i.e. the one
containing 843) has a significant change. I don't think this patch
warrants a major revision tick, but it does make sense to have people
really scrutinize it and to have them know that there are significant
gains to be had.
2. or, at a minimum, do a tag of the trunk right before committing.
I just find explicit tags make it easier to rollback or compare diffs
if need be
Note these suggestions are by no means a judgment of the quality of
the patch, just some precautions before such a big change.
-Grant
On Jul 2, 2007, at 1:31 PM, Grant Ingersoll wrote:
>
> On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote:
>
>> Hi,
>>
>> I'd like to commit LUCENE-843.
>>
>> The patch has gone through a number of iterations but the final
>> version that's there now (take9) is quite a bit cleaner & simpler
>> than
>> the ones leading up to it and I believe ready.
>>
>> It provides solid indexing performance gains (between 2X-8X), but, it
>> is somewhat more complex than the current "single doc per segment"
>> approach and it does introduce a change to the index format (only
>> when
>> autoCommit=false) whereby multiple segments can share a single set of
>> term vector & stored fields files.
>>
>
> +0 for now, I will try to review tonight or tomorrow night. From
> what I gather from reading the issue, etc. it sounds great and you
> and others have put a lot of hard work into it. Also, from some
> benchmarking I have done, it seems to sit well with the notion of
> optimizing merge factor, etc. based on the amount of memory available.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Grant Ingersoll <gs...@apache.org>.
On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote:
> Hi,
>
> I'd like to commit LUCENE-843.
>
> The patch has gone through a number of iterations but the final
> version that's there now (take9) is quite a bit cleaner & simpler than
> the ones leading up to it and I believe ready.
>
> It provides solid indexing performance gains (between 2X-8X), but, it
> is somewhat more complex than the current "single doc per segment"
> approach and it does introduce a change to the index format (only when
> autoCommit=false) whereby multiple segments can share a single set of
> term vector & stored fields files.
>
+0 for now, I will try to review tonight or tomorrow night. From
what I gather from reading the issue, etc. it sounds great and you
and others have put a lot of hard work into it. Also, from some
benchmarking I have done, it seems to sit well with the notion of
optimizing merge factor, etc. based on the amount of memory available.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)
Posted by Doug Cutting <cu...@apache.org>.
+1 This is great work! Commit it.
Doug
Michael McCandless wrote:
> Hi,
>
> I'd like to commit LUCENE-843.
>
> The patch has gone through a number of iterations but the final
> version that's there now (take9) is quite a bit cleaner & simpler than
> the ones leading up to it and I believe ready.
>
> It provides solid indexing performance gains (between 2X-8X), but, it
> is somewhat more complex than the current "single doc per segment"
> approach and it does introduce a change to the index format (only when
> autoCommit=false) whereby multiple segments can share a single set of
> term vector & stored fields files.
>
> Given that it's such a big change I think (?) it's appropriate to ask
> for a vote (only PMC member votes are binding) to make sure we have
> consensus that this is net/net a good change for Lucene.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org