You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2007/01/11 02:25:27 UTC

Lockless commits -- great stuff!

Greets,

I've finished integrating the lockless commits concept into  
KinoSearch, and I wanted to pop in and say that it's a very nice  
piece of work.  Real outside-the-box thinking -- or at least outside  
my box.  :)  Nothing better than an innovation which solves long- 
standing problems AND allows you to eliminate large chunks of code!

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Doug Cutting <cu...@apache.org>.
Marvin Humphrey wrote:
> I'm writing a lot of KS 0.20 code with the notion that it will 
> be submitted to Lucy  [ ... ]

Friendly reminder: if this is going to be eventually contributed to 
Apache, you need to make sure that all contributions can be under 
Apache's CLA.  This would be simplest if you don't accept other folks 
contributions to KS, then you're the sole owner and can contribute it 
all to Apache.  But if you accept contributions from others into KS, 
then we'll need to get those folks to agree to Apache's terms before the 
code can be contributed to Apache.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Jan 16, 2007, at 2:30 PM, robert engels wrote:

> What is the problem with implementing the KinoSearch model for  
> Lucene? It seems this would solve nearly all of these issues in a  
> very srtaightfoward way.

It's a major undertaking, and the only developer sufficiently  
motivated thus far has been me.  I have only so much time to dedicate  
to working on Java Lucene.

I also don't see how you do this without bytecount-based strings.

My general plan has been to force KS 0.20_01 out the door, then  
present it as a model.  But Michael McCandless has been accelerating  
the discussions around here and so it seemed better to get a word in  
while the window was open. :)

> It would seem that a very simple SortPool implementation that did  
> everything in memory would be ideal for Lucene server based  
> environments.

The external sorter keeps track of its memory consumption, and only  
flushes to disk when a user-settable threshold is exceeded.  You can  
set the threshold high if you like.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by robert engels <re...@ix.netcom.com>.
What is the problem with implementing the KinoSearch model for  
Lucene? It seems this would solve nearly all of these issues in a  
very srtaightfoward way.

BTW, the KinoSearch model is nearly exactly what we did when we our  
original implementation of IndexReader/Writer wrote directly to JDBC.

It would seem that a very simple SortPool implementation that did  
everything in memory would be ideal for Lucene server based  
environments.


On Jan 16, 2007, at 4:18 PM, Marvin Humphrey wrote:

> Late response...
>
> On Jan 12, 2007, at 3:02 AM, Michael McCandless wrote:
>
>> Now that readers are read-only, I think it makes sense to default the
>> write lock into the index directory, and as you describe, no longer
>> generate a "unique namespace" hash lock ID since the index dir gives
>> us that scoping.
>
> For the record, it was Doug who originally pointed out that we no  
> longer needed the lock dir.
>
>>> Well, I look forward to seeing whether you can suggest  
>>> improvements on some of the algos I'll bring up in this forum  
>>> once KS 0.20_01 is out.  :)
>>
>> I will try, but I'm already behind just trying to understand how we
>> could improve Lucene based on your current KS release!  Is there any
>> preview/general summary of what's being done for KS 2.0/Lucy?
>
> There isn't a general summary.  KS 0.20 moves away from Lucene in  
> several ways.  It's a bit of an experiment, and while a lot of it  
> will end up in Lucy, the more significant changes we'll just have  
> to see about.  Since I'm blessed and cursed as the sole coder on  
> KS, it's easier for me to JFDI and then present a fully documented,  
> tested, benchmarked, coherent codebase than to present something  
> half-baked and explain all the missing pieces.
>
> The main item from the current release is the KinoSearch merge  
> model.  There's a high-level description of the algorithm on the  
> Lucy wiki: <http://wiki.apache.org/lucy/KinoSearchMergeModel>.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Marvin Humphrey <ma...@rectangular.com>.
Late response...

On Jan 12, 2007, at 3:02 AM, Michael McCandless wrote:

> Now that readers are read-only, I think it makes sense to default the
> write lock into the index directory, and as you describe, no longer
> generate a "unique namespace" hash lock ID since the index dir gives
> us that scoping.

For the record, it was Doug who originally pointed out that we no  
longer needed the lock dir.

>> Well, I look forward to seeing whether you can suggest  
>> improvements on some of the algos I'll bring up in this forum once  
>> KS 0.20_01 is out.  :)
>
> I will try, but I'm already behind just trying to understand how we
> could improve Lucene based on your current KS release!  Is there any
> preview/general summary of what's being done for KS 2.0/Lucy?

There isn't a general summary.  KS 0.20 moves away from Lucene in  
several ways.  It's a bit of an experiment, and while a lot of it  
will end up in Lucy, the more significant changes we'll just have to  
see about.  Since I'm blessed and cursed as the sole coder on KS,  
it's easier for me to JFDI and then present a fully documented,  
tested, benchmarked, coherent codebase than to present something half- 
baked and explain all the missing pieces.

The main item from the current release is the KinoSearch merge  
model.  There's a high-level description of the algorithm on the Lucy  
wiki: <http://wiki.apache.org/lucy/KinoSearchMergeModel>.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Yonik Seeley <yo...@apache.org>.
On 1/12/07, Michael McCandless <lu...@mikemccandless.com> wrote:
> Now that readers are read-only, I think it makes sense to default the
> write lock into the index directory, and as you describe, no longer
> generate a "unique namespace" hash lock ID since the index dir gives
> us that scoping.

+1

> Are there any reasons not to do this?  I will open a JIRA issue to
> track this.

I don't think there are any implications for Solr's current
replication scheme...
I don't *think* that making a hard-link copy of the index directory
(including the write-lock) is problematic, and distributing any
write-lock to searchers should also be harmless.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Michael McCandless <lu...@mikemccandless.com>.
Marvin Humphrey wrote:
> 
> On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:
> 
>> I too am happy that we have no more commit lock :)
> 
> Not just that.  :)
> 
> No more lock directory, since we can put write.lock in the index 
> directory itself.
> 
> No more lock file name munging, since lock files from different indexes 
> no longer need to avoid collisions within a shared namespace.
> 
> No more need to deal with any files outside of the index directory.
> 
> Those three changes have a bigger impact on Lucy than they do on Lucene, 
> and since I'm writing a lot of KS 0.20 code with the notion that it will 
> be submitted to Lucy, they're having an impact on what I'm doing right 
> now.  C doesn't provide a number of the dependencies needed to support 
> the old lock system, so we would either have had to include them, write 
> them ourselves, or supply the needed functionality via PITA callbacks to 
> the host language (Perl, Ruby, etc).
> 
> Since the lock directory lived in the system's tmp directory, we needed 
> code to discover where it was.  Now we don't.
> 
> The lock file name munging required a checksum string generator.  We 
> don't need that now.
> 
> Lastly, a failure of imagination had left me blind to the fact that we 
> didn't need sophisticated, portable filepath manipulating routines: just 
> knowing a directory separator suffices.  Previously, I'd wrapped Perl's 
> File::Spec::Functions to make catfile() and canonpath() available from 
> C.  That hadn't been necessary, because we could have built up the 
> lockfile paths given the location of the tmp directory and the dir_sep.  
> However, as is often the case, simplifying the implementation reveals 
> unnecessary cruft, and when all of a sudden everything ended up in one 
> directory with a splash, it became obvious that generating filepaths 
> didn't require heavy machinery.
> 
>> But I have to say the lockless changes pale in comparison to what you
>> have done/are doing with KinoSearch, specifically the clean merge
>> model with an external sorter and other related file format changes
>> look very interesting.

Ooh, excellent points!

In fact, we haven't done this follow-through for Lucene but I think we
now should?  I think having only one directory (the index directory)
where things happen, and simple file name for the write lock
("write.lock") is a great simplification to our users.

Now that readers are read-only, I think it makes sense to default the
write lock into the index directory, and as you describe, no longer
generate a "unique namespace" hash lock ID since the index dir gives
us that scoping.

Are there any reasons not to do this?  I will open a JIRA issue to
track this.

> Well, I look forward to seeing whether you can suggest improvements on 
> some of the algos I'll bring up in this forum once KS 0.20_01 is out.  :)

I will try, but I'm already behind just trying to understand how we
could improve Lucene based on your current KS release!  Is there any
preview/general summary of what's being done for KS 2.0/Lucy?  I tried
to quickly search the KS archives and look through Lucy's archives but
didn't find any solid hit.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:

> I too am happy that we have no more commit lock :)

Not just that.  :)

No more lock directory, since we can put write.lock in the index  
directory itself.

No more lock file name munging, since lock files from different  
indexes no longer need to avoid collisions within a shared namespace.

No more need to deal with any files outside of the index directory.

Those three changes have a bigger impact on Lucy than they do on  
Lucene, and since I'm writing a lot of KS 0.20 code with the notion  
that it will be submitted to Lucy, they're having an impact on what  
I'm doing right now.  C doesn't provide a number of the dependencies  
needed to support the old lock system, so we would either have had to  
include them, write them ourselves, or supply the needed  
functionality via PITA callbacks to the host language (Perl, Ruby, etc).

Since the lock directory lived in the system's tmp directory, we  
needed code to discover where it was.  Now we don't.

The lock file name munging required a checksum string generator.  We  
don't need that now.

Lastly, a failure of imagination had left me blind to the fact that  
we didn't need sophisticated, portable filepath manipulating  
routines: just knowing a directory separator suffices.  Previously,  
I'd wrapped Perl's File::Spec::Functions to make catfile() and  
canonpath() available from C.  That hadn't been necessary, because we  
could have built up the lockfile paths given the location of the tmp  
directory and the dir_sep.  However, as is often the case,  
simplifying the implementation reveals unnecessary cruft, and when  
all of a sudden everything ended up in one directory with a splash,  
it became obvious that generating filepaths didn't require heavy  
machinery.

> But I have to say the lockless changes pale in comparison to what you
> have done/are doing with KinoSearch, specifically the clean merge
> model with an external sorter and other related file format changes
> look very interesting.

Well, I look forward to seeing whether you can suggest improvements  
on some of the algos I'll bring up in this forum once KS 0.20_01 is  
out.  :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lockless commits -- great stuff!

Posted by Michael McCandless <lu...@mikemccandless.com>.
Marvin Humphrey wrote:

> I've finished integrating the lockless commits concept into KinoSearch, 
> and I wanted to pop in and say that it's a very nice piece of work.  
> Real outside-the-box thinking -- or at least outside my box.  :)  
> Nothing better than an innovation which solves long-standing problems 
> AND allows you to eliminate large chunks of code!

Thanks Marvin!  I too am happy that we have no more commit lock :)

But I have to say the lockless changes pale in comparison to what you
have done/are doing with KinoSearch, specifically the clean merge
model with an external sorter and other related file format changes
look very interesting.

I have been wanting to catch up on your work here (ever since that
sneaky reference to the "Gordian knot" appeared a while back!) and
understand how we could similarly improve Lucene's approach to
merging, but haven't quite succeeded [yet].

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org