You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Em <ma...@yahoo.de> on 2011/04/10 10:34:58 UTC

NRT consistency

Hello list,

I am currently trying to understand Lucene's Near-Real-Time-Feature which
was covered in "Lucene in Action, Second Edition".

Let's say I got a distributed system with a master and a slave.

In Solr replication is solved by checking for any differences in the
index-directory and to consume those differences to keep indices consistent.

How is this possible within a NRT-System? Is there any possibility to
consume snapshots of the internal buffer of the index writer to send them to
the slave?

Regards,
Em

--
View this message in context: http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2801878.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by Michael McCandless <lu...@mikemccandless.com>.
That's a neat question (how to replicate index changes from opening a
new NRT reader).

The good news, the segments are flushed "like normal", so they can be
replicated to the mirror(s).

But the bad news is, the segments file is not written to disk -- it's
held only in RAM (inside IW and shared to the IR that's opened), as a
SegmentInfos instance, so somehow replication would have to get this
in-memory segments file over to the mirrors, too.

Mike

http://blog.mikemccandless.com

On Sun, Apr 10, 2011 at 4:34 AM, Em <ma...@yahoo.de> wrote:
> Hello list,
>
> I am currently trying to understand Lucene's Near-Real-Time-Feature which
> was covered in "Lucene in Action, Second Edition".
>
> Let's say I got a distributed system with a master and a slave.
>
> In Solr replication is solved by checking for any differences in the
> index-directory and to consume those differences to keep indices consistent.
>
> How is this possible within a NRT-System? Is there any possibility to
> consume snapshots of the internal buffer of the index writer to send them to
> the slave?
>
> Regards,
> Em
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2801878.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by Mark Miller <ma...@gmail.com>.
On Apr 11, 2011, at 2:41 PM, Otis Gospodnetic wrote:

> I think what's being described here is a lot like what I *think* ElasticSearch 
> does, where there is no single master and index changed made to any node get 
> propagated to N-1 other nodes (N=number of index replicas).  I'm not sure how it 
> deals with situations where "incompatible" index changes are made to the same 
> index via 2 different nodes at the same time.  Is that what vector clocks are 
> about?

Right - you have to have some sort of conflict detection/resolution - Amazon Dynamo uses vector clocks for this.

> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
>> From: Mark Miller <ma...@gmail.com>
>> To: java-user@lucene.apache.org
>> Sent: Mon, April 11, 2011 11:52:05 AM
>> Subject: Re: NRT consistency
>> 
>> 
>> On Apr 10, 2011, at 4:34 AM, Em wrote:
>> 
>>> Hello list,
>>> 
>>> I am currently trying to understand Lucene's Near-Real-Time-Feature  which
>>> was covered in "Lucene in Action, Second Edition".
>>> 
>>> Let's say I got a distributed system with a master and a slave.
>>> 
>>> In Solr replication is solved by checking for any differences in  the
>>> index-directory and to consume those differences to keep indices  
> consistent.
>>> 
>>> How is this possible within a NRT-System? Is there  any possibility to
>>> consume snapshots of the internal buffer of the index  writer to send them 
> to
>>> the slave?
>> 
>> I think for near real time,  Solr index replication may not be appropriate. 
>> Though I think it would be cool  to use Andrzej's mythical single pass index 
>> splitter to create a single+ doc  segment that could be shipped around.
>> 
>> Most likely, a system that just  sends each doc to each replica is probably 
>> going to work a lot better.  Introduces other issues of course - some of which 
>> we hope to alleviate with  further SolrCloud work.
>> 
>>> 
>>> Regards,
>>> Em
>>> 
>>> --
>>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2801878.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>> 
>>> ---------------------------------------------------------------------
>>> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> Lucene/Solr User  Conference
>> May 25-26, San  Francisco
>> www.lucenerevolution.org
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For  additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I think what's being described here is a lot like what I *think* ElasticSearch 
does, where there is no single master and index changed made to any node get 
propagated to N-1 other nodes (N=number of index replicas).  I'm not sure how it 
deals with situations where "incompatible" index changes are made to the same 
index via 2 different nodes at the same time.  Is that what vector clocks are 
about?

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Mark Miller <ma...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Mon, April 11, 2011 11:52:05 AM
> Subject: Re: NRT consistency
> 
> 
> On Apr 10, 2011, at 4:34 AM, Em wrote:
> 
> > Hello list,
> > 
> > I am currently trying to understand Lucene's Near-Real-Time-Feature  which
> > was covered in "Lucene in Action, Second Edition".
> > 
> > Let's say I got a distributed system with a master and a slave.
> > 
> > In Solr replication is solved by checking for any differences in  the
> > index-directory and to consume those differences to keep indices  
consistent.
> > 
> > How is this possible within a NRT-System? Is there  any possibility to
> > consume snapshots of the internal buffer of the index  writer to send them 
to
> > the slave?
> 
> I think for near real time,  Solr index replication may not be appropriate. 
>Though I think it would be cool  to use Andrzej's mythical single pass index 
>splitter to create a single+ doc  segment that could be shipped around.
> 
> Most likely, a system that just  sends each doc to each replica is probably 
>going to work a lot better.  Introduces other issues of course - some of which 
>we hope to alleviate with  further SolrCloud work.
> 
> > 
> > Regards,
> > Em
> > 
> > --
> > View this message in context: 
>http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2801878.html
> >  Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> > 
> >  ---------------------------------------------------------------------
> > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> - Mark Miller
> lucidimagination.com
> 
> Lucene/Solr User  Conference
> May 25-26, San  Francisco
> www.lucenerevolution.org
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For  additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by Mark Miller <ma...@gmail.com>.
On Apr 11, 2011, at 1:05 PM, Em wrote:

> Thank you both!
> 
> Mark, could you explain what you mean? I never heard from such an
> index-splitter. BTW: The idea of having a segment per document sounds a lot
> like an exception for too many FileDescriptors :)

This is just an idea for rebalancing I suppose - an index splitter lets you split up an index - there is a multi pass splitter in contrib. So if you wanted to move a few documents around (to rebalance after a couple servers go down perhaps), you might split out another index (just the docs you want to move), and then ship off that already analyzed and indexed bunch of documents to other servers.

> 
> Mike, as you said, the segments are flushed like normal.
> Let's say my server dies for whatever reason, when restarting it and
> reopening the index-writer: Does the IW deletes the flushed file, because it
> is not mentioned in the segmentInfo - file or how does Lucene handle this
> internally?
> 
> Regards,
> Em
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2807475.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, Apr 11, 2011 at 1:05 PM, Em <ma...@yahoo.de> wrote:

> Mike, as you said, the segments are flushed like normal.
> Let's say my server dies for whatever reason, when restarting it and
> reopening the index-writer: Does the IW deletes the flushed file, because it
> is not mentioned in the segmentInfo - file or how does Lucene handle this
> internally?

Right, it deletes all such segments, back until the last successful commit().

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by Em <ma...@yahoo.de>.
Thank you both!

Mark, could you explain what you mean? I never heard from such an
index-splitter. BTW: The idea of having a segment per document sounds a lot
like an exception for too many FileDescriptors :)

Mike, as you said, the segments are flushed like normal.
Let's say my server dies for whatever reason, when restarting it and
reopening the index-writer: Does the IW deletes the flushed file, because it
is not mentioned in the segmentInfo - file or how does Lucene handle this
internally?

Regards,
Em

--
View this message in context: http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2807475.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT consistency

Posted by 张成 <zh...@gmail.com>.
Something like dynamo's pattern, in the near real time searching, we should
make N = W.

在 2011 4 11 23:52,"Mark Miller" <ma...@gmail.com>写道:
>
> On Apr 10, 2011, at 4:34 AM, Em wrote:
>
>> Hello list,
>>
>> I am currently trying to understand Lucene's Near-Real-Time-Feature which
>> was covered in "Lucene in Action, Second Edition".
>>
>> Let's say I got a distributed system with a master and a slave.
>>
>> In Solr replication is solved by checking for any differences in the
>> index-directory and to consume those differences to keep indices
consistent.
>>
>> How is this possible within a NRT-System? Is there any possibility to
>> consume snapshots of the internal buffer of the index writer to send them
to
>> the slave?
>
> I think for near real time, Solr index replication may not be appropriate.
Though I think it would be cool to use Andrzej's mythical single pass index
splitter to create a single+ doc segment that could be shipped around.
>
> Most likely, a system that just sends each doc to each replica is probably
going to work a lot better. Introduces other issues of course - some of
which we hope to alleviate with further SolrCloud work.
>
>>
>> Regards,
>> Em
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2801878.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> - Mark Miller
> lucidimagination.com
>
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: NRT consistency

Posted by Mark Miller <ma...@gmail.com>.
On Apr 10, 2011, at 4:34 AM, Em wrote:

> Hello list,
> 
> I am currently trying to understand Lucene's Near-Real-Time-Feature which
> was covered in "Lucene in Action, Second Edition".
> 
> Let's say I got a distributed system with a master and a slave.
> 
> In Solr replication is solved by checking for any differences in the
> index-directory and to consume those differences to keep indices consistent.
> 
> How is this possible within a NRT-System? Is there any possibility to
> consume snapshots of the internal buffer of the index writer to send them to
> the slave?

I think for near real time, Solr index replication may not be appropriate. Though I think it would be cool to use Andrzej's mythical single pass index splitter to create a single+ doc segment that could be shipped around.

Most likely, a system that just sends each doc to each replica is probably going to work a lot better. Introduces other issues of course - some of which we hope to alleviate with further SolrCloud work.

> 
> Regards,
> Em
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/NRT-consistency-tp2801878p2801878.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org