You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Colin Goodheart-Smithe <co...@googlemail.com> on 2012/06/06 13:16:11 UTC

IndexCommit.delete() outside of IndexDeletionPolicy

I was looking at the Lucene API for IndexCommit and noticed that the
JavaDoc states that

*'Decision that a commit-point should be deleted is taken by the
IndexDeletionPolicy<http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexDeletionPolicy.html>
in
effect and therefore this should only be called by its
onInit()<http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexDeletionPolicy.html#onInit(java.util.List)>
 or onCommit()<http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexDeletionPolicy.html#onCommit(java.util.List)>
 methods.'*
(
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexCommit.html#delete()
 )

I was wondering why this is the case and whether deleting IndexCommits
outside of a IndexDeletionPolicy is actually a bad idea?

To put some context around this I am looking to implement a deletion policy
which is independant of the IndexWriter commit and more dependant on
Processes using particular Commit points being finished with it.
The logic would look something like the following and state would be stored
in something like ZooKeeper so I can have use of ephremal nodes and watcher
events:

   - IndexWriters would have a NoDeletionPolicy set
   - Each time a process opens a session it registers an ephremal node
   - The session is assigned the current (latest) commit point
   - Each time a process removes the node (either through crashing or
   having finished the job) a watch event is fired where a separate process
   will delete the commit point the process was using if no other processes
   are using the commit point and if it is not the latest commit point

Processes may have fairly long running sessions so across all the processes
a reasonable number of commit points might be in use.  I don't really want
to have to wait for a commit from the IndexWriter (which may not happen for
a while) to clear up the older commit points I no longer need.  Would this
logic pose any issues given that it is going to be deleting Commit points
outside of the IndexDeletionPolicy

Re: IndexCommit.delete() outside of IndexDeletionPolicy

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think this use case makes sense; such logic (for a distributed / ref
counted deletion policy) would make a nice contribution ... it's the
"proper" way to delete commits when multiple nodes are in use (vs eg
using a timeout deletion policy).

You can actually do it today: call IndexWriter.deleteUnusedFiles.
That visits the deletion policy and then you have a chance to delete
commit points (it'd mean you have to set a real deletion policy on the
writer, which in turn goes and checks the reference counts across all
nodes).

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jun 6, 2012 at 7:16 AM, Colin Goodheart-Smithe
<co...@googlemail.com> wrote:
> I was looking at the Lucene API for IndexCommit and noticed that the
> JavaDoc states that
>
> *'Decision that a commit-point should be deleted is taken by the
> IndexDeletionPolicy<http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexDeletionPolicy.html>
> in
> effect and therefore this should only be called by its
> onInit()<http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexDeletionPolicy.html#onInit(java.util.List)>
>  or onCommit()<http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexDeletionPolicy.html#onCommit(java.util.List)>
>  methods.'*
> (
> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexCommit.html#delete()
>  )
>
> I was wondering why this is the case and whether deleting IndexCommits
> outside of a IndexDeletionPolicy is actually a bad idea?
>
> To put some context around this I am looking to implement a deletion policy
> which is independant of the IndexWriter commit and more dependant on
> Processes using particular Commit points being finished with it.
> The logic would look something like the following and state would be stored
> in something like ZooKeeper so I can have use of ephremal nodes and watcher
> events:
>
>   - IndexWriters would have a NoDeletionPolicy set
>   - Each time a process opens a session it registers an ephremal node
>   - The session is assigned the current (latest) commit point
>   - Each time a process removes the node (either through crashing or
>   having finished the job) a watch event is fired where a separate process
>   will delete the commit point the process was using if no other processes
>   are using the commit point and if it is not the latest commit point
>
> Processes may have fairly long running sessions so across all the processes
> a reasonable number of commit points might be in use.  I don't really want
> to have to wait for a commit from the IndexWriter (which may not happen for
> a while) to clear up the older commit points I no longer need.  Would this
> logic pose any issues given that it is going to be deleting Commit points
> outside of the IndexDeletionPolicy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org