You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Masurel <pa...@gmail.com> on 2016/08/22 06:31:50 UTC

Lucene commit

Hi,

If I understand correctly, Lucene indexing threads are working on their own
individual segment.
When a thread has enough documents in its segment, it flushes it on disc
and starts a new one.
But segments are only searchable when they are commited.

Now my question is, wouldn't it be nice to be able to set up Lucene so that
segments are made searchable as soon as they are flushed?

Commit would still play the roll of "checkpoint" in a hardware failure
scenario.
This is different from the old "autocommit" feature in that sense.

Of course, this "searchable yet not committed flushed segment" leads to the
following weird behavior :
- documents can become searchable and in case of failure, become not
searchable
(and then eventually searchable again if the client does its job properly
and reindexes rollbacked documents).
- one document can become searchable after another one even though it was
added before.

The benefit would be to reduce the average latency for a document
to become searchable, without hurting throughput by calling commit() too
frequently.

Regards,

Paul

Re: Lucene commit

Posted by Paul Masurel <pa...@gmail.com>.
Awesome! Thank you very much!



On Mon, Aug 22, 2016 at 3:45 PM, Christoph Kaser <lu...@iconparc.de>
wrote:

> Hello Paul,
>
> this is already possible using DirectoryReader.openIfChanged(indexReader,indexWriter).
> This will give you an indexreader that already "sees" all changes made by
> the writer (up to that point), even though the changes were not yet
> committed:
> https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/
> index/DirectoryReader.html#openIfChanged-org.apache.lucen
> e.index.DirectoryReader-org.apache.lucene.index.IndexWriter-
>
> Regards,
> Christoph
>
>
> Am 22.08.2016 um 08:31 schrieb Paul Masurel:
>
>> Hi,
>>
>> If I understand correctly, Lucene indexing threads are working on their
>> own
>> individual segment.
>> When a thread has enough documents in its segment, it flushes it on disc
>> and starts a new one.
>> But segments are only searchable when they are commited.
>>
>> Now my question is, wouldn't it be nice to be able to set up Lucene so
>> that
>> segments are made searchable as soon as they are flushed?
>>
>> Commit would still play the roll of "checkpoint" in a hardware failure
>> scenario.
>> This is different from the old "autocommit" feature in that sense.
>>
>> Of course, this "searchable yet not committed flushed segment" leads to
>> the
>> following weird behavior :
>> - documents can become searchable and in case of failure, become not
>> searchable
>> (and then eventually searchable again if the client does its job properly
>> and reindexes rollbacked documents).
>> - one document can become searchable after another one even though it was
>> added before.
>>
>> The benefit would be to reduce the average latency for a document
>> to become searchable, without hurting throughput by calling commit() too
>> frequently.
>>
>> Regards,
>>
>> Paul
>>
>>
>
>

Re: Lucene commit

Posted by Christoph Kaser <lu...@iconparc.de>.
Hello Paul,

this is already possible using 
DirectoryReader.openIfChanged(indexReader,indexWriter). This will give 
you an indexreader that already "sees" all changes made by the writer 
(up to that point), even though the changes were not yet committed:
https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged-org.apache.lucene.index.DirectoryReader-org.apache.lucene.index.IndexWriter-

Regards,
Christoph

Am 22.08.2016 um 08:31 schrieb Paul Masurel:
> Hi,
>
> If I understand correctly, Lucene indexing threads are working on their own
> individual segment.
> When a thread has enough documents in its segment, it flushes it on disc
> and starts a new one.
> But segments are only searchable when they are commited.
>
> Now my question is, wouldn't it be nice to be able to set up Lucene so that
> segments are made searchable as soon as they are flushed?
>
> Commit would still play the roll of "checkpoint" in a hardware failure
> scenario.
> This is different from the old "autocommit" feature in that sense.
>
> Of course, this "searchable yet not committed flushed segment" leads to the
> following weird behavior :
> - documents can become searchable and in case of failure, become not
> searchable
> (and then eventually searchable again if the client does its job properly
> and reindexes rollbacked documents).
> - one document can become searchable after another one even though it was
> added before.
>
> The benefit would be to reduce the average latency for a document
> to become searchable, without hurting throughput by calling commit() too
> frequently.
>
> Regards,
>
> Paul
>