You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by karl wettin <ka...@gmail.com> on 2007/08/20 04:40:55 UTC

Lockless read-only deletions in IndexReader?

I want to set documents in my IndexReader as deleted, but I will  
never commit these deletions. Sort of a filter on a reader rather  
than on a searcher, and no write-locks.

Can I do that out of the box?

Perhaps I can pass down a IndexDeletionPolicy to my IndexWriter that  
ignores deletions from the IndexReader(s) to avoid the lock?

Changing the directory lock factory it will effect the IndexWriter  
locks too? So that would not be an option, or?

I could go hacking in IndexReader, definalizing it for decoration of  
deleteDocument(int), or something like that, but would really prefere  
not to.


This is for a transactional layer on top of Lucene, where I combine  
the system index with a transactional index. Updated documents that  
are represented in the transaction index must be filtered out from  
the system index at IndexReader level without creating write-locks.  
undeleteAll() would be an option if there was no locks -- more than  
one transaction could be updating documents at the same time.


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lockless read-only deletions in IndexReader?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Excellent, a much simpler approach!

I think it should work?  Maybe override numDocs() as well?

Mike

"Karl Wettin" <ka...@gmail.com> wrote:
> 
> 20 aug 2007 kl. 14.33 skrev Michael McCandless:
> 
> > "karl wettin" <ka...@gmail.com> wrote:
> >
> >> I want to set documents in my IndexReader as deleted, but I will
> >> never commit these deletions. Sort of a filter on a reader rather
> >> than on a searcher, and no write-locks.
> >
> 
> >> I could go hacking in IndexReader, definalizing it for decoration of
> >> deleteDocument(int), or something like that, but would really
> >> prefere not to.
> >
> > Yeah I think it may just be cleanest to modify IndexReader to not
> > acquire the write lock nor commit its changes to the Directory on
> > close.
> 
> How about something simple as this:
> 
> public class DeleteFilteredIndexReader extends FilterIndexReader {
> 
>    public DeleteFilteredIndexReader(IndexReader in) {
>      super(in);
>      filter = new BitSet(in.maxDoc());
>    }
> 
>    private BitSet filter;
> 
>    public boolean isDeleted(int n) {
>      return filter.get(n) || super.isDeleted(n);
>    }
> 
>    public boolean hasDeletions() {
>      return filter.nextSetBit(0) > -1 || super.hasDeletions();
>    }
> 
> 
> I did not look in the classpath code of BitSet yet, but I suppose
> the nextSetBit(0) could be optimized.
> 
> 
> -- 
> karl
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lockless read-only deletions in IndexReader?

Posted by Karl Wettin <ka...@gmail.com>.
20 aug 2007 kl. 14.33 skrev Michael McCandless:

> "karl wettin" <ka...@gmail.com> wrote:
>
>> I want to set documents in my IndexReader as deleted, but I will
>> never commit these deletions. Sort of a filter on a reader rather
>> than on a searcher, and no write-locks.
>

>> I could go hacking in IndexReader, definalizing it for decoration of
>> deleteDocument(int), or something like that, but would really
>> prefere not to.
>
> Yeah I think it may just be cleanest to modify IndexReader to not
> acquire the write lock nor commit its changes to the Directory on
> close.

How about something simple as this:

public class DeleteFilteredIndexReader extends FilterIndexReader {

   public DeleteFilteredIndexReader(IndexReader in) {
     super(in);
     filter = new BitSet(in.maxDoc());
   }

   private BitSet filter;

   public boolean isDeleted(int n) {
     return filter.get(n) || super.isDeleted(n);
   }

   public boolean hasDeletions() {
     return filter.nextSetBit(0) > -1 || super.hasDeletions();
   }


I did not look in the classpath code of BitSet yet, but I suppose
the nextSetBit(0) could be optimized.


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lockless read-only deletions in IndexReader?

Posted by Karl Wettin <ka...@gmail.com>.
Thanks Mike, I'll dig around a bit and report back what I did.


20 aug 2007 kl. 14.33 skrev Michael McCandless:

>
> "karl wettin" <ka...@gmail.com> wrote:
>
>> I want to set documents in my IndexReader as deleted, but I will
>> never commit these deletions. Sort of a filter on a reader rather
>> than on a searcher, and no write-locks.
>
>> Perhaps I can pass down a IndexDeletionPolicy to my IndexWriter that
>> ignores deletions from the IndexReader(s) to avoid the lock?
>
> I don't think this will easily work.  The IndexDeletionPolicy just has
> control on which commits should be deleted.  So if you open a reader,
> do deletes, and close it, a new commit (segments_N+1) is created.
>
> You could have your deletion policy at this point preserve the
> previous commit, but then there's no way to tell IndexWriter on
> opening the index to open the previous commit (it always opens the
> most recent commit).
>
> I guess if IndexWriter/IndexReader could open old commits, and if your
> app could separately keep track of which commits should be ignored
> (because they were from readers), then you could do it?
>
> The problem is the reader will still write the commit into the
> directory on close...
>
>> Changing the directory lock factory it will effect the IndexWriter
>> locks too? So that would not be an option, or?
>
> Could you just use a NullLockFactory on the reader?  This way when you
> do deletes in the reader and it acquires a write lock it doesn't in
> fact acquire the write lock (as seen by IndexWriter which would
> presumably be using a "real" LockFactory)?
>
> But, the reader will still write into the directory and stomp on any
> IndexWriter that was also committing ....
>
>> I could go hacking in IndexReader, definalizing it for decoration of
>> deleteDocument(int), or something like that, but would really
>> prefere not to.
>
> Yeah I think it may just be cleanest to modify IndexReader to not
> acquire the write lock nor commit its changes to the Directory on
> close.
>
>> This is for a transactional layer on top of Lucene, where I combine
>> the system index with a transactional index. Updated documents that
>> are represented in the transaction index must be filtered out from
>> the system index at IndexReader level without creating
>> write-locks. undeleteAll() would be an option if there was no locks
>> -- more than one transaction could be updating documents at the same
>> time.
>
> Sounds wild!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lockless read-only deletions in IndexReader?

Posted by Michael McCandless <lu...@mikemccandless.com>.
"karl wettin" <ka...@gmail.com> wrote:

> I want to set documents in my IndexReader as deleted, but I will
> never commit these deletions. Sort of a filter on a reader rather
> than on a searcher, and no write-locks.

> Perhaps I can pass down a IndexDeletionPolicy to my IndexWriter that
> ignores deletions from the IndexReader(s) to avoid the lock?

I don't think this will easily work.  The IndexDeletionPolicy just has
control on which commits should be deleted.  So if you open a reader,
do deletes, and close it, a new commit (segments_N+1) is created.

You could have your deletion policy at this point preserve the
previous commit, but then there's no way to tell IndexWriter on
opening the index to open the previous commit (it always opens the
most recent commit).

I guess if IndexWriter/IndexReader could open old commits, and if your
app could separately keep track of which commits should be ignored
(because they were from readers), then you could do it?

The problem is the reader will still write the commit into the
directory on close...

> Changing the directory lock factory it will effect the IndexWriter
> locks too? So that would not be an option, or?

Could you just use a NullLockFactory on the reader?  This way when you
do deletes in the reader and it acquires a write lock it doesn't in
fact acquire the write lock (as seen by IndexWriter which would
presumably be using a "real" LockFactory)?

But, the reader will still write into the directory and stomp on any
IndexWriter that was also committing ....

> I could go hacking in IndexReader, definalizing it for decoration of
> deleteDocument(int), or something like that, but would really
> prefere not to.

Yeah I think it may just be cleanest to modify IndexReader to not
acquire the write lock nor commit its changes to the Directory on
close.

> This is for a transactional layer on top of Lucene, where I combine
> the system index with a transactional index. Updated documents that
> are represented in the transaction index must be filtered out from
> the system index at IndexReader level without creating
> write-locks. undeleteAll() would be an option if there was no locks
> -- more than one transaction could be updating documents at the same
> time.

Sounds wild!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org