You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/01/23 21:25:28 UTC

IndexReader.isDeleted

We are considering replacing the current random-access
IndexReader.isDeleted(int docID) method with an iterator & skipTo
(DocIdSet) access that would let you iterate through the deleted
docIDs, instead.

At the same time we would move to a new API to replace
IndexReader.document(int docID) that would no longer check whether the
document is deleted.

This is being discussed now under several Jira issues and on
java-dev.

Would this be a problem for any Lucene applications out there?

How is isDeleted used today (outside of Lucene)?  Normally an
IndexSearcher would never return a deleted document, and so "in
theory" a deleted docID should never "escape" Lucene's APIs.

So I'm curious what applications in fact rely on isDeleted, and how
that method is being used...

Thanks,

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isDeleted

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK, interesting, thanks.  What do you use the deletedDocs iterator for?

Yes, MatchAllDocsQuery should soon be fixed to not use the  
synchronized IndexReader.isDeleted method internally:

     https://issues.apache.org/jira/browse/LUCENE-1316

Mike

John Wang wrote:

> Mike:
>       "We are considering replacing the current random-access
> IndexReader.isDeleted(int docID) method with an iterator & skipTo
> (DocIdSet) access that would let you iterate through the deleted
> docIDs, instead."
>
>      This is exactly what we are doing. We do have to however, build  
> the
> internal DocIdSet from isDeleted call. It would be great if this is  
> provided
> thru the api.
>
>      I am also assuming MatchAllDocsQuery is fixed to avoid  
> isDeleted call?
>
> -John
>
> On Fri, Jan 23, 2009 at 12:25 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> We are considering replacing the current random-access
>> IndexReader.isDeleted(int docID) method with an iterator & skipTo
>> (DocIdSet) access that would let you iterate through the deleted
>> docIDs, instead.
>>
>> At the same time we would move to a new API to replace
>> IndexReader.document(int docID) that would no longer check whether  
>> the
>> document is deleted.
>>
>> This is being discussed now under several Jira issues and on
>> java-dev.
>>
>> Would this be a problem for any Lucene applications out there?
>>
>> How is isDeleted used today (outside of Lucene)?  Normally an
>> IndexSearcher would never return a deleted document, and so "in
>> theory" a deleted docID should never "escape" Lucene's APIs.
>>
>> So I'm curious what applications in fact rely on isDeleted, and how
>> that method is being used...
>>
>> Thanks,
>>
>> Mike
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isDeleted

Posted by John Wang <jo...@gmail.com>.
Mike:
       "We are considering replacing the current random-access
IndexReader.isDeleted(int docID) method with an iterator & skipTo
(DocIdSet) access that would let you iterate through the deleted
docIDs, instead."

      This is exactly what we are doing. We do have to however, build the
internal DocIdSet from isDeleted call. It would be great if this is provided
thru the api.

      I am also assuming MatchAllDocsQuery is fixed to avoid isDeleted call?

-John

On Fri, Jan 23, 2009 at 12:25 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> We are considering replacing the current random-access
> IndexReader.isDeleted(int docID) method with an iterator & skipTo
> (DocIdSet) access that would let you iterate through the deleted
> docIDs, instead.
>
> At the same time we would move to a new API to replace
> IndexReader.document(int docID) that would no longer check whether the
> document is deleted.
>
> This is being discussed now under several Jira issues and on
> java-dev.
>
> Would this be a problem for any Lucene applications out there?
>
> How is isDeleted used today (outside of Lucene)?  Normally an
> IndexSearcher would never return a deleted document, and so "in
> theory" a deleted docID should never "escape" Lucene's APIs.
>
> So I'm curious what applications in fact rely on isDeleted, and how
> that method is being used...
>
> Thanks,
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: IndexReader.isDeleted

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK, interesting.  This case looks like it'd be a good fit for  
iteration-API to access deleted docs.  (And, a good case for column- 
stride fields, too!).

Thanks for sharing Ian,

Mike

Ian Lea wrote:

> Hi Mike
>
>
> I've got some applications that use lucene purely as a place to store
> data, with no searching other than by product id, and have programs
> that get all the data out of the store by code like
>
> for (int i = 0; i < max; i++) {
>    if (!reader.isDeleted(i)) {
>      Document doc = reader.document(i);
>    ...
> }
>
> The index has regular updates and occasional optimizes so normally
> does contain deleted docs.
>
> If the isDeleted() method was removed it would only be a minor
> inconvenience - I'd be happy to code to any new API calls, or change
> the method to call optimize first, or whatever.
>
>
>
> --
> Ian.
> ian.lea@gmail.com
>
>
> On Fri, Jan 23, 2009 at 8:25 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> We are considering replacing the current random-access
>> IndexReader.isDeleted(int docID) method with an iterator & skipTo
>> (DocIdSet) access that would let you iterate through the deleted
>> docIDs, instead.
>>
>> At the same time we would move to a new API to replace
>> IndexReader.document(int docID) that would no longer check whether  
>> the
>> document is deleted.
>>
>> This is being discussed now under several Jira issues and on
>> java-dev.
>>
>> Would this be a problem for any Lucene applications out there?
>>
>> How is isDeleted used today (outside of Lucene)?  Normally an
>> IndexSearcher would never return a deleted document, and so "in
>> theory" a deleted docID should never "escape" Lucene's APIs.
>>
>> So I'm curious what applications in fact rely on isDeleted, and how
>> that method is being used...
>>
>> Thanks,
>>
>> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isDeleted

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2009-01-26 at 11:55 +0100, Ian Lea wrote:
> for (int i = 0; i < max; i++) {
>     if (!reader.isDeleted(i)) {
>       Document doc = reader.document(i);
>     ...
> }

Hey! You've stolen our code! :-)

While we don't use Lucene in the same way as you, we also perform
iterations over all documents. An iterative approach to deleted
documents, instead of the current random access, would be fine by us.
It'll just take a minor refactoring to get it to work in our end.

- Toke Eskildsen, http://statsbiblioteket.dk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: IndexReader.isDeleted

Posted by Uwe Schindler <uw...@thetaphi.de>.
The same here:
I often have code that needs to get *all* documents out of an IndexReader
(excluding deleted docs). Currently this is coded like the example from Ian.

This is often code that checks content of indexes or iterates over all
documents without any Query.

A better API may be good.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Monday, January 26, 2009 11:55 AM
> To: java-user@lucene.apache.org
> Subject: Re: IndexReader.isDeleted
> 
> Hi Mike
> 
> 
> I've got some applications that use lucene purely as a place to store
> data, with no searching other than by product id, and have programs
> that get all the data out of the store by code like
> 
> for (int i = 0; i < max; i++) {
>     if (!reader.isDeleted(i)) {
>       Document doc = reader.document(i);
>     ...
> }
> 
> The index has regular updates and occasional optimizes so normally
> does contain deleted docs.
> 
> If the isDeleted() method was removed it would only be a minor
> inconvenience - I'd be happy to code to any new API calls, or change
> the method to call optimize first, or whatever.
> 
> 
> 
> --
> Ian.
> ian.lea@gmail.com
> 
> 
> On Fri, Jan 23, 2009 at 8:25 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
> > We are considering replacing the current random-access
> > IndexReader.isDeleted(int docID) method with an iterator & skipTo
> > (DocIdSet) access that would let you iterate through the deleted
> > docIDs, instead.
> >
> > At the same time we would move to a new API to replace
> > IndexReader.document(int docID) that would no longer check whether the
> > document is deleted.
> >
> > This is being discussed now under several Jira issues and on
> > java-dev.
> >
> > Would this be a problem for any Lucene applications out there?
> >
> > How is isDeleted used today (outside of Lucene)?  Normally an
> > IndexSearcher would never return a deleted document, and so "in
> > theory" a deleted docID should never "escape" Lucene's APIs.
> >
> > So I'm curious what applications in fact rely on isDeleted, and how
> > that method is being used...
> >
> > Thanks,
> >
> > Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isDeleted

Posted by Ian Lea <ia...@gmail.com>.
Hi Mike


I've got some applications that use lucene purely as a place to store
data, with no searching other than by product id, and have programs
that get all the data out of the store by code like

for (int i = 0; i < max; i++) {
    if (!reader.isDeleted(i)) {
      Document doc = reader.document(i);
    ...
}

The index has regular updates and occasional optimizes so normally
does contain deleted docs.

If the isDeleted() method was removed it would only be a minor
inconvenience - I'd be happy to code to any new API calls, or change
the method to call optimize first, or whatever.



--
Ian.
ian.lea@gmail.com


On Fri, Jan 23, 2009 at 8:25 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> We are considering replacing the current random-access
> IndexReader.isDeleted(int docID) method with an iterator & skipTo
> (DocIdSet) access that would let you iterate through the deleted
> docIDs, instead.
>
> At the same time we would move to a new API to replace
> IndexReader.document(int docID) that would no longer check whether the
> document is deleted.
>
> This is being discussed now under several Jira issues and on
> java-dev.
>
> Would this be a problem for any Lucene applications out there?
>
> How is isDeleted used today (outside of Lucene)?  Normally an
> IndexSearcher would never return a deleted document, and so "in
> theory" a deleted docID should never "escape" Lucene's APIs.
>
> So I'm curious what applications in fact rely on isDeleted, and how
> that method is being used...
>
> Thanks,
>
> Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org