You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2009/07/14 21:08:46 UTC

addIndexes* blocks addDocuments calls

For replicating and general system performance, it would be good to offer a
way to addIndexes* without blocking the addition of more docs. This seems
doable somehow?

Re: addIndexes* blocks addDocuments calls

Posted by Jason Rutherglen <ja...@gmail.com>.
> EG you could imagine an addIndexes* call getting started,
completing a few merges. Then, concurrently, CMS picks & chooses
some of those added external segments to merge with some of the
original segments. Then addIndexes hits an exception. What do we
do?

Right because we're rolling back all the segmentinfos at once
when using transactions.

> If IndexWriter maintained "branches" of the segmentInfos we
could actually rollback all changes, ie, remove what was done by
addIndexes but retroactively preserve any segments created by
other methods (flushing, other addIndexes calls, etc.).

I think on commit we resolve all external and (with LUCENE-1313)
ramdir segments in the foreground. Otherwise they are being
merged in the background as usual?

So we'd add a new method addIndexesNoCommit?


On Tue, Jul 14, 2009 at 12:56 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
>
> I agree, there's no real reason why addIndexes can run concurrently
> with other things.  It's just software ;)
>
> One challenge is the transactional guarantee that addIndexes provide,
> ie, it's all or none.  If there's an exception while adding, then
> nothing was added.
>
> But, that was added before autoCommit=false.  So perhaps we could
> relax that and expect the app to instead rely on the "global"
> transactional semantics provided by autoCommit=false.
>
> EG you could imagine an addIndexes* call getting started, completing a
> few merges.  Then, concurrently, CMS picks & chooses some of those
> added external segments to merge with some of the original segments.
> Then addIndexes hits an exception.  What do we do?
>
> If IndexWriter maintained "branches" of the segmentInfos we could
> actually rollback all changes, ie, remove what was done by addIndexes
> but retroactively preserve any segments created by other methods
> (flushing, other addIndexes calls, etc.).
>
> Mike
>
> On Tue, Jul 14, 2009 at 3:08 PM, Jason
> Rutherglen<ja...@gmail.com> wrote:
> > For replicating and general system performance, it would be good to offer a
> > way to addIndexes* without blocking the addition of more docs. This seems
> > doable somehow?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: addIndexes* blocks addDocuments calls

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, Jul 21, 2009 at 7:26 PM, Jason
Rutherglen<ja...@gmail.com> wrote:
>> EG you could imagine an addIndexes* call getting started,
>> completing a few merges. Then, concurrently, CMS picks & chooses
>> some of those added external segments to merge with some of the
>> original segments. Then addIndexes hits an exception. What do we
>> do?
>
> An exception due to an IO error from the external dir?

Or OOME or a Lucene bug or something.

> We can
> abandon the merge and remove the external segments that failed
> from segmentInfos?

But that won't work in this case because some of the external segments
have been merged with "real" segments in the index.

>> If IndexWriter maintained "branches" of the segmentInfos we
>> could actually rollback all changes, ie, remove what was done by
>> addIndexes but retroactively preserve any segments created by
>> other methods (flushing, other addIndexes calls, etc.).
>
> Branches stored in a TreeMap? With the keys being the
> method+randomId that initiated them? (i.e. addIndexes12)

Actually I meant keeping track of the "genealogy" of how the merges
were done, plus holding an extra refCount against "real" segments that
had been merged with external-but-not-yet-committed segments.

So that if the addIndexes fails, yet we had merged [say] "real"
segments 1, 2, 3 with "external" segments 4 and 5, we could on failure
of addIndexes go "undo" that merge and put back segments 1, 2, 3.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: addIndexes* blocks addDocuments calls

Posted by Jason Rutherglen <ja...@gmail.com>.
> EG you could imagine an addIndexes* call getting started,
> completing a few merges. Then, concurrently, CMS picks & chooses
> some of those added external segments to merge with some of the
> original segments. Then addIndexes hits an exception. What do we
> do?

An exception due to an IO error from the external dir? We can
abandon the merge and remove the external segments that failed
from segmentInfos?

> If IndexWriter maintained "branches" of the segmentInfos we
> could actually rollback all changes, ie, remove what was done by
> addIndexes but retroactively preserve any segments created by
> other methods (flushing, other addIndexes calls, etc.).

Branches stored in a TreeMap? With the keys being the
method+randomId that initiated them? (i.e. addIndexes12)

On Tue, Jul 14, 2009 at 12:56 PM, Michael
McCandless<lu...@mikemccandless.com> wrote:
> I agree, there's no real reason why addIndexes can run concurrently
> with other things.  It's just software ;)
>
> One challenge is the transactional guarantee that addIndexes provide,
> ie, it's all or none.  If there's an exception while adding, then
> nothing was added.
>
> But, that was added before autoCommit=false.  So perhaps we could
> relax that and expect the app to instead rely on the "global"
> transactional semantics provided by autoCommit=false.
>
> EG you could imagine an addIndexes* call getting started, completing a
> few merges.  Then, concurrently, CMS picks & chooses some of those
> added external segments to merge with some of the original segments.
> Then addIndexes hits an exception.  What do we do?
>
> If IndexWriter maintained "branches" of the segmentInfos we could
> actually rollback all changes, ie, remove what was done by addIndexes
> but retroactively preserve any segments created by other methods
> (flushing, other addIndexes calls, etc.).
>
> Mike
>
> On Tue, Jul 14, 2009 at 3:08 PM, Jason
> Rutherglen<ja...@gmail.com> wrote:
>> For replicating and general system performance, it would be good to offer a
>> way to addIndexes* without blocking the addition of more docs. This seems
>> doable somehow?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: addIndexes* blocks addDocuments calls

Posted by Michael McCandless <lu...@mikemccandless.com>.
I agree, there's no real reason why addIndexes can run concurrently
with other things.  It's just software ;)

One challenge is the transactional guarantee that addIndexes provide,
ie, it's all or none.  If there's an exception while adding, then
nothing was added.

But, that was added before autoCommit=false.  So perhaps we could
relax that and expect the app to instead rely on the "global"
transactional semantics provided by autoCommit=false.

EG you could imagine an addIndexes* call getting started, completing a
few merges.  Then, concurrently, CMS picks & chooses some of those
added external segments to merge with some of the original segments.
Then addIndexes hits an exception.  What do we do?

If IndexWriter maintained "branches" of the segmentInfos we could
actually rollback all changes, ie, remove what was done by addIndexes
but retroactively preserve any segments created by other methods
(flushing, other addIndexes calls, etc.).

Mike

On Tue, Jul 14, 2009 at 3:08 PM, Jason
Rutherglen<ja...@gmail.com> wrote:
> For replicating and general system performance, it would be good to offer a
> way to addIndexes* without blocking the addition of more docs. This seems
> doable somehow?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org