You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2010/04/19 08:25:49 UTC

[jira] Created: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Add an explicit method to invoke IndexDeletionPolicy
----------------------------------------------------

                 Key: LUCENE-2402
                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Shai Erera
            Assignee: Shai Erera
             Fix For: 3.1


Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:

* Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
* I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
* TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
{code}
// Add one more document to force writer to commit a
// final segment, so deletion policy has a chance to
// delete again:
Document doc = new Document();
doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
{code}

If IW had an explicit method, that code would not need to exist there at all ...

Here comes the fun part - naming the baby:
* invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
* deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.

BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2402:
-------------------------------

    Attachment: LUCENE-2402.patch

Adds revisitPolicy to IFD (package-private) and also calls it from IW.deleteUnusedFiles. All tests pass

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch, LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859369#action_12859369 ] 

Michael McCandless commented on LUCENE-2402:
--------------------------------------------

bq. The failure happens in CommitPoint's ctor in the assert statement which verifies the SegmentInos does not have external Directory.

Urgh.... indeed we must protect against one thread doing addIndexes
and another thread calling deleteUnusedFiles.

The way things work today (and I agree we should fix this) is
addIndexes* immediately modify the in memory segments to include
foreign (external Dir) segments, then proceed to target these foreign
segments by merging them away.

bq. I've tried to sync on commitLock (which seems good anyway), but the test kept failing.

This isn't strictly necessary, I think?  The two ops (ongoing commit
-- takes time since fync can be so slow -- and deleting unused files)
are orthogonal.  They both invoke IDP/IFD, but this is still protected
(sync'd on IW)...

bq. Only when passing rollbackSI to checkpoint does the test pass.

In fact this is the right track, I think...  rollbackSI is a clone of
the last committed segments, whereas the "live" segments contains all
uncommitted stuff that's happened since.  We really should not be
treating these pending changes as if they were a commit point... so
using rollbackSI makes sense.

But, the problem is, IFD.checkpoint will hold a new commit point when
you pass isCommit=true, which is no good.  I think we need to open up
a new package private method in IFD, eg "revisitPolicy" or some such, which
just does:
{code}
if (infoStream != null) {
  message("now visit...");
}
deletePendingFiles();
policy.onCommit(commits);
deleteCommits();
{code}

Ie, most of what IFD.checkpoint does when isCommit=true, minus the
incRef (which has already been done, in the past, for this segments)
and the commits.add of a new commit point.

Invoking IDP.onCommit still isn't quite right (no new commit was done)
but I think it's OK for now?  (Adding some kind of "visit" method
feels like overkill...).

bq. BTW, the test fails on DirReader.doClose, where it checks if writer != null and then calls deleteUnusedFiles. So I guess it's a NRT problem only.

Hmm the problem should be wider than just NRT.  Any time one thread
calls deleteUnusedFiles while another is doing addIndexes*, this bug
should be hit-able.

bq. In general, that that addIndexesNoOptimize messes w/ SI seems dangerous to me, because that's undocumented and unprotected

I agree.

I'd love to [eventually] change addIndexes*, so that it does all its
work "privately" and only in the end atomically "checks in" the
(not-foreign) segments it produced.  It gets tricky, though, since
"normal" segment merging, and flushing, is still ongoing, and we'd not
want to do redundant merging work.

This also messes up NRT, ie, if you open an NRT reader during an
addIndexes*, you can see some segments already added and some now --
ie NRT violates the advertised atomicity of addIndexes* (the javadocs
note this).

I think we really need to factor IW apart:

  # Indexer (add/update/delete), also flushes new segments

  # Keeper of the segments file (exposes API to make atomic changes to
    segments file, does commits, interacts w/ IDP/IFD)

  # Merger (normal merging, optimize, expungeDeletes, addIndexes)

  # Reader pool


> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858426#action_12858426 ] 

Earwin Burrfoot commented on LUCENE-2402:
-----------------------------------------

Lets reuse IW.deleteUnusedFiles() ?
No need to multiply confusion )

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera resolved LUCENE-2402.
--------------------------------

    Resolution: Fixed

Committed revision 936605.

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch, LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2402:
-------------------------------

    Attachment: LUCENE-2402.patch

Patch changes deleteUnusedFiles to call IFD.checkpoint and also adds a testDeleteUnusedFiles2 to TestIndexWriter.

Currently, TestIndexWriterReader.testDuringAddIndexes fails, if deleteUnusedFiles is coded like this:
{code}
public synchronized void deleteUnusedFiles() throws IOException {
  deleter.checkpoint(segmentInfos, true);
}
{code}

The failure happens in CommitPoint's ctor in the assert statement which verifies the SegmentInos does not have external Directory. When I debug-traced the test, it passed and so I concluded it's a concurrency issue (and indeed testDuringAddIndexes spawns several threads.

addIndexesNoOptimize does change SegmentInfos as it adds indexes, however at the end it 'fixes' their Directory reference. I wondered how is regular commit() works when addIndexesNoOptimize is called, but couldn't find any synchronization block where one blocks the other. Eventually, I've changed deleteUnusedFiles to this:
{code}
public synchronized void deleteUnusedFiles() throws IOException {
  synchronized (commitLock) {
    deleter.checkpoint(rollbackSegmentInfos, true);
    // deleter.checkpoint((SegmentInfos) segmentInfos.clone(), true);
  }
}
{code}

I've tried to sync on commitLock (which seems good anyway), but the test kept failing. Even cloning SI did not work because it might have changed just before the clone. Only when passing rollbackSI to checkpoint does the test pass. But I'm not sure if that's the right solution, as when I debug-traced it and put a break point just before the call to checkpoint, SI included one segment w/ a different name than rollbackSI ...

BTW, the test fails on DirReader.doClose, where it checks if writer != null and then calls deleteUnusedFiles. So I guess it's a NRT problem only.

In general, that that addIndexesNoOptimize messes w/ SI seems dangerous to me, because that's undocumented and unprotected - e.g. if someone extends IW and adds some logic which requires reading SI ... I'm not sure how to solve it, but that seems unrelated to that issue (probably much more complicated to solve).

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859389#action_12859389 ] 

Shai Erera commented on LUCENE-2402:
------------------------------------

Ok I understand. About the name, revisitPolicy is not exactly accurate (I think?) because it also deletes the pending files (and not just revisit the policy). Unless IW.deleteUnusedFiles will invoke both deletePendingFiles and revisitPolicy ... the latter will just do
{code}
if (commits.size() > 0) {
  policy.onCommit(commits);
  deleteCommits();
}
{code}

What do you think?

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859393#action_12859393 ] 

Michael McCandless commented on LUCENE-2402:
--------------------------------------------

+1, I think that's a good approach.

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859403#action_12859403 ] 

Michael McCandless commented on LUCENE-2402:
--------------------------------------------

Patch looks good Shai!

But you can remove the " ..." in the message output -- I had put into my code thinking there may be details we put instead of that "...", but, I think there are no further details.

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch, LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859418#action_12859418 ] 

Shai Erera commented on LUCENE-2402:
------------------------------------

ok I'll remove them before commit. Will commit this later - giving chance for more people to review.

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch, LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858738#action_12858738 ] 

Michael McCandless commented on LUCENE-2402:
--------------------------------------------

bq. Lets reuse IW.deleteUnusedFiles() ?

+1

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859381#action_12859381 ] 

Michael McCandless commented on LUCENE-2402:
--------------------------------------------

bq. So maybe reuse deletePendingFiles?

Hmm I think this is too much?  EG we call that on every checkpoint call... so this'd mean the IDP gets 2 onCommit calls per commit (one "fake" one and one "real")?

I think it's better if the "fake" onCommit only arrives when IW.deleteUnusedFiles is invoked?

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859377#action_12859377 ] 

Shai Erera commented on LUCENE-2402:
------------------------------------

bq. I think we need to open up a new package private method in IFD, eg "revisitPolicy" or some such

So maybe reuse deletePendingFiles? I.e. this method does not accept anything, and so seems that revisitPolicy won't (just pass on to IDP its 'commits' member). IFD is anyway an internal API ... I'll give it a try. deletePendingFiles would just do what it does, then in the end call policy.onCommit(commits) and deleteCommits() ... what do you think? And then IW does not need to change.

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy, or any other IDP in the tests, those segments are left in the index even if the IDP no longer references them, until IW.commit() is called (and actually does something). I'd like to add a specific method to IW which will invoke the IDP's logic and get rid of the unused segments w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim that space immediately. In some scenarios, the index is updated once in several hours (or even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can still wait for it if one wants, but otherwise it will give you the ability to immediately get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the user did not set IDP at all (relying on default, which I think many do), users won't understand what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments' are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint (instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org