You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2007/01/11 02:25:27 UTC
Lockless commits -- great stuff!
Greets,
I've finished integrating the lockless commits concept into
KinoSearch, and I wanted to pop in and say that it's a very nice
piece of work. Real outside-the-box thinking -- or at least outside
my box. :) Nothing better than an innovation which solves long-
standing problems AND allows you to eliminate large chunks of code!
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Doug Cutting <cu...@apache.org>.
Marvin Humphrey wrote:
> I'm writing a lot of KS 0.20 code with the notion that it will
> be submitted to Lucy [ ... ]
Friendly reminder: if this is going to be eventually contributed to
Apache, you need to make sure that all contributions can be under
Apache's CLA. This would be simplest if you don't accept other folks
contributions to KS, then you're the sole owner and can contribute it
all to Apache. But if you accept contributions from others into KS,
then we'll need to get those folks to agree to Apache's terms before the
code can be contributed to Apache.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Jan 16, 2007, at 2:30 PM, robert engels wrote:
> What is the problem with implementing the KinoSearch model for
> Lucene? It seems this would solve nearly all of these issues in a
> very srtaightfoward way.
It's a major undertaking, and the only developer sufficiently
motivated thus far has been me. I have only so much time to dedicate
to working on Java Lucene.
I also don't see how you do this without bytecount-based strings.
My general plan has been to force KS 0.20_01 out the door, then
present it as a model. But Michael McCandless has been accelerating
the discussions around here and so it seemed better to get a word in
while the window was open. :)
> It would seem that a very simple SortPool implementation that did
> everything in memory would be ideal for Lucene server based
> environments.
The external sorter keeps track of its memory consumption, and only
flushes to disk when a user-settable threshold is exceeded. You can
set the threshold high if you like.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by robert engels <re...@ix.netcom.com>.
What is the problem with implementing the KinoSearch model for
Lucene? It seems this would solve nearly all of these issues in a
very srtaightfoward way.
BTW, the KinoSearch model is nearly exactly what we did when we our
original implementation of IndexReader/Writer wrote directly to JDBC.
It would seem that a very simple SortPool implementation that did
everything in memory would be ideal for Lucene server based
environments.
On Jan 16, 2007, at 4:18 PM, Marvin Humphrey wrote:
> Late response...
>
> On Jan 12, 2007, at 3:02 AM, Michael McCandless wrote:
>
>> Now that readers are read-only, I think it makes sense to default the
>> write lock into the index directory, and as you describe, no longer
>> generate a "unique namespace" hash lock ID since the index dir gives
>> us that scoping.
>
> For the record, it was Doug who originally pointed out that we no
> longer needed the lock dir.
>
>>> Well, I look forward to seeing whether you can suggest
>>> improvements on some of the algos I'll bring up in this forum
>>> once KS 0.20_01 is out. :)
>>
>> I will try, but I'm already behind just trying to understand how we
>> could improve Lucene based on your current KS release! Is there any
>> preview/general summary of what's being done for KS 2.0/Lucy?
>
> There isn't a general summary. KS 0.20 moves away from Lucene in
> several ways. It's a bit of an experiment, and while a lot of it
> will end up in Lucy, the more significant changes we'll just have
> to see about. Since I'm blessed and cursed as the sole coder on
> KS, it's easier for me to JFDI and then present a fully documented,
> tested, benchmarked, coherent codebase than to present something
> half-baked and explain all the missing pieces.
>
> The main item from the current release is the KinoSearch merge
> model. There's a high-level description of the algorithm on the
> Lucy wiki: <http://wiki.apache.org/lucy/KinoSearchMergeModel>.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Marvin Humphrey <ma...@rectangular.com>.
Late response...
On Jan 12, 2007, at 3:02 AM, Michael McCandless wrote:
> Now that readers are read-only, I think it makes sense to default the
> write lock into the index directory, and as you describe, no longer
> generate a "unique namespace" hash lock ID since the index dir gives
> us that scoping.
For the record, it was Doug who originally pointed out that we no
longer needed the lock dir.
>> Well, I look forward to seeing whether you can suggest
>> improvements on some of the algos I'll bring up in this forum once
>> KS 0.20_01 is out. :)
>
> I will try, but I'm already behind just trying to understand how we
> could improve Lucene based on your current KS release! Is there any
> preview/general summary of what's being done for KS 2.0/Lucy?
There isn't a general summary. KS 0.20 moves away from Lucene in
several ways. It's a bit of an experiment, and while a lot of it
will end up in Lucy, the more significant changes we'll just have to
see about. Since I'm blessed and cursed as the sole coder on KS,
it's easier for me to JFDI and then present a fully documented,
tested, benchmarked, coherent codebase than to present something half-
baked and explain all the missing pieces.
The main item from the current release is the KinoSearch merge
model. There's a high-level description of the algorithm on the Lucy
wiki: <http://wiki.apache.org/lucy/KinoSearchMergeModel>.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Yonik Seeley <yo...@apache.org>.
On 1/12/07, Michael McCandless <lu...@mikemccandless.com> wrote:
> Now that readers are read-only, I think it makes sense to default the
> write lock into the index directory, and as you describe, no longer
> generate a "unique namespace" hash lock ID since the index dir gives
> us that scoping.
+1
> Are there any reasons not to do this? I will open a JIRA issue to
> track this.
I don't think there are any implications for Solr's current
replication scheme...
I don't *think* that making a hard-link copy of the index directory
(including the write-lock) is problematic, and distributing any
write-lock to searchers should also be harmless.
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Michael McCandless <lu...@mikemccandless.com>.
Marvin Humphrey wrote:
>
> On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:
>
>> I too am happy that we have no more commit lock :)
>
> Not just that. :)
>
> No more lock directory, since we can put write.lock in the index
> directory itself.
>
> No more lock file name munging, since lock files from different indexes
> no longer need to avoid collisions within a shared namespace.
>
> No more need to deal with any files outside of the index directory.
>
> Those three changes have a bigger impact on Lucy than they do on Lucene,
> and since I'm writing a lot of KS 0.20 code with the notion that it will
> be submitted to Lucy, they're having an impact on what I'm doing right
> now. C doesn't provide a number of the dependencies needed to support
> the old lock system, so we would either have had to include them, write
> them ourselves, or supply the needed functionality via PITA callbacks to
> the host language (Perl, Ruby, etc).
>
> Since the lock directory lived in the system's tmp directory, we needed
> code to discover where it was. Now we don't.
>
> The lock file name munging required a checksum string generator. We
> don't need that now.
>
> Lastly, a failure of imagination had left me blind to the fact that we
> didn't need sophisticated, portable filepath manipulating routines: just
> knowing a directory separator suffices. Previously, I'd wrapped Perl's
> File::Spec::Functions to make catfile() and canonpath() available from
> C. That hadn't been necessary, because we could have built up the
> lockfile paths given the location of the tmp directory and the dir_sep.
> However, as is often the case, simplifying the implementation reveals
> unnecessary cruft, and when all of a sudden everything ended up in one
> directory with a splash, it became obvious that generating filepaths
> didn't require heavy machinery.
>
>> But I have to say the lockless changes pale in comparison to what you
>> have done/are doing with KinoSearch, specifically the clean merge
>> model with an external sorter and other related file format changes
>> look very interesting.
Ooh, excellent points!
In fact, we haven't done this follow-through for Lucene but I think we
now should? I think having only one directory (the index directory)
where things happen, and simple file name for the write lock
("write.lock") is a great simplification to our users.
Now that readers are read-only, I think it makes sense to default the
write lock into the index directory, and as you describe, no longer
generate a "unique namespace" hash lock ID since the index dir gives
us that scoping.
Are there any reasons not to do this? I will open a JIRA issue to
track this.
> Well, I look forward to seeing whether you can suggest improvements on
> some of the algos I'll bring up in this forum once KS 0.20_01 is out. :)
I will try, but I'm already behind just trying to understand how we
could improve Lucene based on your current KS release! Is there any
preview/general summary of what's being done for KS 2.0/Lucy? I tried
to quickly search the KS archives and look through Lucy's archives but
didn't find any solid hit.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:
> I too am happy that we have no more commit lock :)
Not just that. :)
No more lock directory, since we can put write.lock in the index
directory itself.
No more lock file name munging, since lock files from different
indexes no longer need to avoid collisions within a shared namespace.
No more need to deal with any files outside of the index directory.
Those three changes have a bigger impact on Lucy than they do on
Lucene, and since I'm writing a lot of KS 0.20 code with the notion
that it will be submitted to Lucy, they're having an impact on what
I'm doing right now. C doesn't provide a number of the dependencies
needed to support the old lock system, so we would either have had to
include them, write them ourselves, or supply the needed
functionality via PITA callbacks to the host language (Perl, Ruby, etc).
Since the lock directory lived in the system's tmp directory, we
needed code to discover where it was. Now we don't.
The lock file name munging required a checksum string generator. We
don't need that now.
Lastly, a failure of imagination had left me blind to the fact that
we didn't need sophisticated, portable filepath manipulating
routines: just knowing a directory separator suffices. Previously,
I'd wrapped Perl's File::Spec::Functions to make catfile() and
canonpath() available from C. That hadn't been necessary, because we
could have built up the lockfile paths given the location of the tmp
directory and the dir_sep. However, as is often the case,
simplifying the implementation reveals unnecessary cruft, and when
all of a sudden everything ended up in one directory with a splash,
it became obvious that generating filepaths didn't require heavy
machinery.
> But I have to say the lockless changes pale in comparison to what you
> have done/are doing with KinoSearch, specifically the clean merge
> model with an external sorter and other related file format changes
> look very interesting.
Well, I look forward to seeing whether you can suggest improvements
on some of the algos I'll bring up in this forum once KS 0.20_01 is
out. :)
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Lockless commits -- great stuff!
Posted by Michael McCandless <lu...@mikemccandless.com>.
Marvin Humphrey wrote:
> I've finished integrating the lockless commits concept into KinoSearch,
> and I wanted to pop in and say that it's a very nice piece of work.
> Real outside-the-box thinking -- or at least outside my box. :)
> Nothing better than an innovation which solves long-standing problems
> AND allows you to eliminate large chunks of code!
Thanks Marvin! I too am happy that we have no more commit lock :)
But I have to say the lockless changes pale in comparison to what you
have done/are doing with KinoSearch, specifically the clean merge
model with an external sorter and other related file format changes
look very interesting.
I have been wanting to catch up on your work here (ever since that
sneaky reference to the "Gordian knot" appeared a while back!) and
understand how we could similarly improve Lucene's approach to
merging, but haven't quite succeeded [yet].
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org