You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Smith (JIRA)" <ji...@apache.org> on 2009/06/19 19:46:07 UTC

[jira] Created: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Add a waitForMerges() method to IndexWriter
-------------------------------------------

                 Key: LUCENE-1703
                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: 2.4
            Reporter: Tim Smith


It would be very useful to have a waitForMerges() method on the IndexWriter.

Right now, the only way i can see to achieve this is to call IndexWriter.close()

ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter

the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
this could then result in the following sequence:
* close() - fails
* force unlock the write lock (per close() documentation)
* new IndexWriter() (acquires write lock)
* finalize() on old IndexWriter releases the write lock
* Index is now not locked, and another IndexWriter pointing to the same directory could be opened

If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter

If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721903#action_12721903 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

You can use ConcurrentMergeScheduler.sync()?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722065#action_12722065 ] 

Uwe Schindler commented on LUCENE-1703:
---------------------------------------

bq. I still don't understand how if autocommit=false, and addDocument/deleteDocument can trigger a merge.

Why not? I always use autocommit=false. After I added a number of documents to the index some of the existing not-yet committed segments get merged. It will never merge the current existing and committed segments, but the segments created during indexing can be merged. After commit() or close() the already existing and the new segments are maybe additionally merged and a new segments file written.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722274#action_12722274 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

bq. If we change the default to not wait, then I don't think matchVersion will matter. Since I'll need to update my code anyway, I don't think I'd care whether I call waitForMerges(), or matchVersion, even though the latter is called only once and the change is required in just one place. matchVersion, IMO, is required in more complicated cases, like that ACRONYM issue, which my code really has no control over.

But we can't suddenly not wait, if you call optimize(), right?  (That's too severe a change in runtime behavior) Ie, we'd have to rename it to something else, and deprecate the old one.

Actually... I'm no longer so sure we should make this change to the default.  I think the current default (optimize(), addIndexes*(), expungeDeletes(), etc., all wait until they're "done") is actually a good default?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith updated LUCENE-1703:
------------------------------

    Attachment: IndexWriter.java.diff

Here's a diff for IndexWriter.java

moved code from else block in finishMerges() into public waitForMerges() method

this should wait for any outstanding merges to be complete prior to exiting

will see if i can also work up a test case

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722019#action_12722019 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

{quote}
MergeScheduler does not provide a sync() method in its abstract class. Therefore, an implementation could be used that would not derive from or provide a sync() method

If it did, this would be a more than acceptable solution instead of putting the method on the IndexWriter. (i potentially want to allow configuration to specify what MergeScheduler to use, and if sync() isn't specified on the base class, i can't reliably use it (will have to do instanceof checks, and a new concurrent based scheduler not derived from ConcurrentMergeScheduler breaks this)
{quote}

We can add a sync() method to MergeScheduler or you do it on your side. There are only two MS today: CMS and SMS. If you're going to write your own CMS variant, then you can have it extend CMS, or define your own interface with a sync() method, which will delegate that call to whatever MS it wraps. Again, the way I see it, you know which MS you instantiate, and therefore you should be able to declare its type and use whatever methods it has. If you want something configurable, I suggest we move sync() to MS (w/ a default impl of doing nothing, for back-compat).

What do you think?

bq. ... another thread calling addDocument triggering a merge

This will only happen if you use autoCommit=true, right? You can use autoCommit=true, and when you call commit(), use the one that waits for merges (the one that does not exist yet :) ), and sync all your addDoc threads with that current commit thread (make sure you use a ReadWriteLock, or otherwise all your addDocs are going to sync on each other).

bq. this will allow blocking adding more documents until background merges are complete periodically

What's "periodically"? If you're using autoCommit=false, those merges cannot happen suddenly. There are only 4 methods that can trigger them (I think?) and you can sync on calling them w/ the rest of the addDoc threads.

Don't get me wrong - I just play the devil's advocate here. Recently, we've had this thread http://www.nabble.com/ReadOnly-IndexReaders-td23787255.html - a naive question on adding an ever more naive method to IndexReader - isReadOnly. If you read it, you'll see that it wasn't accepted at the end, because we've understood it will just try to cover a design flaw of your system.

Therefore I'm just trying to verify whether this method is absolutely necessary, and there aren't any "design decisions" we should make in our apps to better handle that case.

Having said that, I'm not a committer and if one decides it's important enough, one would take this issue and commit it. I just enjoy the discussion  for now (hope you too :) ).

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1703.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9

Thanks Tim!

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722037#action_12722037 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

NOTE: I'm always using autoCommit=false (autoCommit=true is deprecated anyway)

however, i could potentially have 2 threads feeding the index (in my custom code)
one thread may call addDocument() (or maybeMerge() to be more to the point)
this thread could result in the SerialMergeScheduler to start merging (addDocument() won't return until this merge completes)
I then want thread 2 to call waitForMerges(), at which point it will wait till the first thread will have finished its merges (at which point addDocument will have returned)

Obviously this is a contrived example as i personally will be locking the updates such that no addDocument() call could be in process when i want to call waitForMerges(), however this situation points out that even the SerialMergeScheduler should have an actual implementation for a sync() method, which would block until the thread actually doing the merge has completed. (as i may be calling sync() from a different thread other than the one the IndexWriter called merge() on) SerialMergeScheduler should therefore have a lock that will be held while merging, and a sync() method should be added that will just acquire and release the lock. Making both the sync() and merge() methods on the SerialMergeScheduler would achieve this (and the sync would just be a synchronized noop)

It seems more natural to me to put this "sync" on the IndexWriter itself, especially as this will be completely agnostic to the merge scheduler used.


for the "periodic" waiting for merge thread completion, this would be driven by messages from client code to request a "soft optimize" perhaps, which would just wait for background merges to complete. This could then result in turning over a new IndexReader for more efficient searches than using the old IndexReader (which may be more segmented). This message asking for a "soft optimize" may be sent on some scheduled basis in order to achieve better search performance (without the cost of an explicit optimize)


Discussion is all well and good, and i definitely appreciate all comments.
Even if this doesn't end up going in, you've pointed out another solution (using expungeDeletes()) which will achieve the same solution for me at least.



> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722069#action_12722069 ] 

Jason Rutherglen commented on LUCENE-1703:
------------------------------------------

Seems like a useful feature.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith updated LUCENE-1703:
------------------------------

    Attachment: IndexWriter.java.diff

Very minor change
moved assert on mergingSegments.size() into waitForMerges()

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722292#action_12722292 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

I'm not against the default, just thought that waitForMerges() will be even more useful if all of these didn't have the doWait variant, and it will reduce the number of methods on IW. We can leave them around though, I guess it doesn't hurt.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722040#action_12722040 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

There are also "merges" that take place outside of the merge scheduler, eg addIndexesNoOptimize copies over the tail segments by running its own merges.  Also, with LUCENE-1313, there may be merges that run entirely in RAM, not under MergeScheduler.

I think it makes sense to expose waitForMerges in IW (instead of duplicating the code in every merge scheduler).  We may be able to then deprecate CMS.sync?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722819#action_12722819 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

I also noticed that at least on expungeDeletes() these methods may only wait for merges that the method itself initiated

the waitForMerges() method would wait for all merges regardless of who initiated the merge.

so:
{code}
try {
  expungeDeletes(false);
} finally {
  waitForMerges();
}
{code}

would not be the same as
{code}
expungeDeletes(true);
{code}

as the later would only wait for the merges actually caused by the expungeDeletes() method

This would indicate that the waitForMerges=true/false variants of different methods should remain

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721908#action_12721908 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

thought maybe that method would do it, however that method is undocumented (no javadoc) 

ideally, the IndexWriter would have these semantics in order to handle any configured MergeScheduler that may perform background syncs

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721987#action_12721987 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

I'm just playing the devil's advocate here. 

I'm still not sure that this method is needed. A MergeScheduler is given to IndexWriter, and therefore your code should be able to sync() on it, whether it's ConcurrentMergeScheduler or a different one which does background merges, since you control its creation. If it's something that needs to improve on CMS, let's improve it (like documenting sync()).

bq. My primary use case for this is to stabilize an index prior to performing a backup operation

Still .. if you call any of the methods I've mentioned, you can change your code to call the doWait variants, except for commit(). If commit() is what you're after, wouldn't it be better to add a commit(doWait) so that you can call it w/ doWait=true?

BTW, Mike McCandless has written a great article on Hot Backups w/ Lucene, which you can find here: http://www.manning.com/free/green_HotBackupsLucene.html (need to register to Manning first). If you're not familiar w/ it, I suggest you read it since it shows a nice and elegant ways to perform backups, w/o the need to sync your writer.

I'm planning to try it out. One thing you can do, after calling snapshot() is to compare the files returned from snapshot() to the ones in the backup and: (1) delete the ones in backup that do not appear in the returned list of files, and copy over the files that do not exist in the backup (except for .lock). The article is really good and self explanatory.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722231#action_12722231 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

bq. Though we have a migration challenge (if we want to swap the default). Or we add a matchVersion.

If we change the default to not wait, then I don't think matchVersion will matter. Since I'll need to update my code anyway, I don't think I'd care whether I call waitForMerges(), or matchVersion, even though the latter is called only once and the change is required in just one place. matchVersion, IMO, is required in more complicated cases, like that ACRONYM issue, which my code really has no control over.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721951#action_12721951 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

May I ask what's the use case for this? I looked at IndexWriter and the following have doWait versions: close(), optimize() and expungeDeletes(). The only one that doesn't is commit(), which documents it will not wait for merges. So this seems to be a method just for

{code}
try { 
  writer.commit(); 
} finally { 
  writer.waitForMerges(); 
}
{code}

Is there any other use case that you would want to wait for merges to finish?

It's just that adding methods means we need to support them going forward, and if there's no real use case (or at least none that is covered by current API), we should not do it?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722003#action_12722003 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

MergeScheduler does not provide a sync() method in its abstract class. Therefore, an implementation could be used that would not derive from or provide a sync() method

If it did, this would be a more than acceptable solution instead of putting the method on the IndexWriter. (i potentially want to allow configuration to specify what MergeScheduler to use, and if sync() isn't specified on the base class, i can't reliably use it (will have to do instanceof checks, and a new concurrent based scheduler not derived from ConcurrentMergeScheduler breaks this)

Also, the SerialMergeScheduler may be in use and be running due to another thread calling addDocument triggering a merge, but i want my second thread to wait for that merge to complete (this would require implementing sync() on SerialMergeScheduler to allow other threads to sync on the merge as well)

optimize() is not a viable option to use to wait for merges as this could take a long time (hours)
close() requires reopening the IndexWriter (which i am trying to avoid in the first place)
expungeDeletes(true) is doing more work that is actually required (this could be a viable solution, however its a bit odd to "waitForMerges()" by calling a method called expungeDeletes())

I will try getting access to the article you suggest, however i still expect i will want this method at the end of the day (and will end up using expungeDeletes(true) if it never gets added (with copious comments indicating that expungeDeletes() is actually being used to wait for background merges)

Another use for waitForMerges() is as a more lightweight version of optimize().
this will allow blocking adding more documents until background merges are complete periodically, preventing from throttling the IO on the indexer from being hit too hard. Could call IndexWriter.maybeMerge() followed by IndexWriter.waitForMerges()





> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722057#action_12722057 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

{quote}
one thread may call addDocument() (or maybeMerge() to be more to the point)
this thread could result in the SerialMergeScheduler to start merging (addDocument() won't return until this merge completes)
{quote}

Again, if autoCommit=false, how can this happen? I thought that if autoCommit=false, no commit happens and therefore no segment merging?

bq. I think it makes sense to expose waitForMerges in IW (instead of duplicating the code in every merge scheduler). We may be able to then deprecate CMS.sync?

I guess I'm still not convinced what simplicity would that bring. For Tim's use case, just two threads, using SMS, that might work. But for the general use case, or one which uses multiple indexing threads, one of which may call commit() at some point, another daemon may run optimize(), I dunno, this would still require syncing all threads around that waitForMerges call, if the intent is to prevent document additions while merges occur. Therefore this method is not expected to make my life any easier, except that if I need to report "no merges are running" or make a decision based on "no merges are running" I should have one thread call this waitForMerges()

Of course, still playing the devil's advocate, you could call waitForMerges() which will return immediately b/c there are no merges to do or that are running, and soon as that happens, a context switch also happens, and an innocent addDocument will trigger a 50-segments merge, at which point whatever you thought to do b/c there are no merges, will hit the exact scenario you were trying to avoid all that time :).

I'll admit though that it's late here (1 AM), and perhaps I'm not seeing this clearly. And .. I still don't understand how if autocommit=false, and addDocument/deleteDocument can trigger a merge.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722834#action_12722834 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

bq. This would indicate that the waitForMerges=true/false variants of different methods should remain

Ahh good point; and I think the default should be to wait.  OK I plan to commit the current patch in a day or two.  Thanks Tim!

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1703:
------------------------------------------

    Assignee: Michael McCandless

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721911#action_12721911 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

bq. ideally, the IndexWriter would have these semantics in order to handle any configured MergeScheduler that may perform background syncs

I agree.  Wanna cons up a patch?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722169#action_12722169 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

bq. or .. deprecate all the variants of the other methods that wait for merges, and say that the default behavior will not wait, and if you want this, call waitForMergest() afterwards?

+1

Though we have a migration challenge (if we want to swap the default).  Or we add a matchVersion.

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721954#action_12721954 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

My primary use case for this is to stabilize an index prior to performing a backup operation

If background merges are ongoing, then partial segment files will be written to disk (potentially while an rsync or other operation is running)

this will result in partial segments being backed up

while this should still be a viable index, as the partial segments have not been "committed", it results in an unclean backup (this can result in rsync throwing warnings/errors (which operations people greatly dislike)





> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721913#action_12721913 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

I'm not super familiar with internals of IndexWriter, but i'll give it a shot

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721949#action_12721949 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

Patch looks good!  I don't think you actually need a separate unit test, since (as you pointed out), this method is well tested internally?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721938#action_12721938 ] 

Tim Smith commented on LUCENE-1703:
-----------------------------------

I'm finding it a bit tricky to create a proper unit test for this

This is indirectly tested as IndexWriter.close() will call this method

but explicit calling of this method is harder to test as it requires outstanding merges to be running (very difficult to trigger)

I started with TestIndexWriter.testNoWaitClose() as a base
however, modifying this to test waitForMerges() takes 11 seconds to run (surely because it always waits, and doesn't abort the merges)

also, in order to properly test, another method should be added to IndexWriter
{code}
public synchronized int getOutstandingMerges() {
  return mergingSegments.size();
}
{code}

I suppose the most proper test would create a subclass of ConcurrentMergeScheduler that would block all merges until after i released a lock.  Then the test would add docs, fork off thread to unlock for merging in X seconds, then call waitForMerges.


> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722145#action_12722145 ] 

Shai Erera commented on LUCENE-1703:
------------------------------------

Right ... forgot the merges to the uncommitted segments. If we do this, can we also add a commit() version which waits for merges, to complete the set of operations that allow you to wait for merges (just for convenience, so that you don't call commit() followed by a waitForMerges()), or .. deprecate all the variants of the other methods that wait for merges, and say that the default behavior will not wait, and if you want this, call waitForMergest() afterwards?

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1703) Add a waitForMerges() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722168#action_12722168 ] 

Michael McCandless commented on LUCENE-1703:
--------------------------------------------

bq. After I added a number of documents to the index some of the existing not-yet committed segments get merged. It will never merge the current existing and committed segments, but the segments created during indexing can be merged. After commit() or close() the already existing and the new segments are maybe additionally merged and a new segments file written.

Actually, existing segments do get merged (assuming the merge policy selects them).  It's just that they cannot be deleted because logically there are two snapshots while an IndexWriter is running with autoCommit false: the index as of when it was opened, and the current in-memory SegmentInfos.  (And if you are using a deletion policy different from the default KeepOnlyLastCommit, even more snapshots will be kept).

> Add a waitForMerges() method to IndexWriter
> -------------------------------------------
>
>                 Key: LUCENE-1703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1703
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriter.java.diff, IndexWriter.java.diff
>
>
> It would be very useful to have a waitForMerges() method on the IndexWriter.
> Right now, the only way i can see to achieve this is to call IndexWriter.close()
> ideally, there would be a method on the IndexWriter to wait for merges without actually closing the index.
> This would make it so that background merges (or optimize) can be waited for without closing the IndexWriter, and then reopening a new IndexWriter
> the close() reopen IndexWriter method can be problematic if the close() fails as the write lock won't be released
> this could then result in the following sequence:
> * close() - fails
> * force unlock the write lock (per close() documentation)
> * new IndexWriter() (acquires write lock)
> * finalize() on old IndexWriter releases the write lock
> * Index is now not locked, and another IndexWriter pointing to the same directory could be opened
> If you don't force unlock the write lock, opening a new IndexWriter will fail until garbage collection calls finalize() the old IndexWriter
> If the waitForMerges() method is available, i would likely never need to close() the IndexWriter until right before the process being shutdown, so this issue would not occur (worst case scenario, the waitForMerges() fails)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org