You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by lukes <ma...@gmail.com> on 2016/11/09 19:40:50 UTC

Getting list of committed documents

Hi all,

  I need some feedback on getting hold of documents which got committed
during commit call on indexwriter. There are multiple threads which keeps on
adding documents to indexWriter in parallel, and there's another thread
which wakes up after n number of minutes and does the commit. Below are the
points i need help on 

1) How to disable the auto flush / commit ? On IndexWriterConfig i setted up
setMaxBufferedDocs (IndexWriterConfig.DISABLE_AUTO_FLUSH) and also
setRAMBufferSizeMB to high number arnd 128 MB. Is this correct or is there
any other knob i need to play around with ?

2) How to find out which documents got committed during commit(So certain
action can be done, like removing from channel, etc) ? I tried extending
IndexWriter and @Override doAfterFlush, but i don't see any pointer to get
handle of documents which made in this commit. 

any help is really appreciated.

Regards.  



--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Getting list of committed documents

Posted by lukes <ma...@gmail.com>.
Hi,

 Can anyone please suggest or point in some directions.

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305503.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Getting list of committed documents

Posted by lukes <ma...@gmail.com>.
Thanks Mike. Yeah, i saw the changelist you mentioned. Unfortunately i can't
upgrade to 6.2 because of stack limitations :( .

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305728.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Getting list of committed documents

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi lukes,

Sorry, this was a recent change in Lucene:
https://issues.apache.org/jira/browse/LUCENE-7302

You need to upgrade to at least 6.2 to see it.

And the long value that is returned is just an incrementing number,
incremented for every op (add, update, delete) that changes the index.

Mike McCandless

http://blog.mikemccandless.com


On Sat, Nov 12, 2016 at 3:40 PM, lukes <ma...@gmail.com> wrote:
> Hi Michael,
>
>   Thanks for the reply. Regarding IW(IndexWriter) returning long sequence
> number, i looked at the signature of commit and it seems to be void. Can you
> please point me in the direction ? I am using Lucene 5.5.2. Also is this
> number aggregation of deletes, updates and new documents ? Is it count
> progressive over time or number of documents which made into only for that
> commit only ? Once you point me, i can look into for more details.
>
> Thanks a lot.
>
> Regards.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305644.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Getting list of committed documents

Posted by lukes <ma...@gmail.com>.
Hi Michael,

  Thanks for the reply. Regarding IW(IndexWriter) returning long sequence
number, i looked at the signature of commit and it seems to be void. Can you
please point me in the direction ? I am using Lucene 5.5.2. Also is this
number aggregation of deletes, updates and new documents ? Is it count
progressive over time or number of documents which made into only for that
commit only ? Once you point me, i can look into for more details.

Thanks a lot.

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305644.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Getting list of committed documents

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi lukes,

First, IW never "auto commits".  The maxBufferedDocs/RAMBufferSizeMB
settings control when IW moves the recently indexed documents from RAM
to disk, but that moving, which writes new segments files, does not
commit them.  It just writes them to disk, not visible yet to an
external reader (unless you open a near-real-time reader from IW),
until you explicitly call IW.commit.

Second, every IW operation returns a long sequence number, and so does
IW.commit, such that all sequence numbers <= the sequence number
returned from IW.commit "made it" into the index, and all other ops
did not make it.

You should be able to use this information to e.g. tell the channel
(e.g. a kafka queue) which offset your Lucene app has "durably"
consumed.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 9, 2016 at 2:40 PM, lukes <ma...@gmail.com> wrote:
> Hi all,
>
>   I need some feedback on getting hold of documents which got committed
> during commit call on indexwriter. There are multiple threads which keeps on
> adding documents to indexWriter in parallel, and there's another thread
> which wakes up after n number of minutes and does the commit. Below are the
> points i need help on
>
> 1) How to disable the auto flush / commit ? On IndexWriterConfig i setted up
> setMaxBufferedDocs (IndexWriterConfig.DISABLE_AUTO_FLUSH) and also
> setRAMBufferSizeMB to high number arnd 128 MB. Is this correct or is there
> any other knob i need to play around with ?
>
> 2) How to find out which documents got committed during commit(So certain
> action can be done, like removing from channel, etc) ? I tried extending
> IndexWriter and @Override doAfterFlush, but i don't see any pointer to get
> handle of documents which made in this commit.
>
> any help is really appreciated.
>
> Regards.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org