You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Stuart Goldberg <sg...@fixflyer.com> on 2018/07/19 19:23:15 UTC

Deleted documents and NRT Readers

I used NRT readers all the time. I create then with 'applyDeletes' set to
false for performance reasons and take the javadoc at its word that my code
has to be prepared to deal with deleted documents. I thought I understood
that and I wrote my code to be deleted-document-safe.

But I have recently revisited the issue and tried to understand what
happens using a little test program. I create a document and add it to the
index. I then create a new document that mirrors the first one but I change
the value of a field. Then I call IndexWriter.updateDocument() which is a
delete and an add.

I then get a NRT reader with applyDeletes set to false and do a
MatchAllDocsQuery search. I would expect to get 2 documents back: the
current one and the updated one. But I only get back the updated one.

But I know in real code with 1000's of documents flying into the index that
I have gotten deleted documents returned.

Can someone explain to me why my small test program doesn't get the deleted
documents back?

Stuart M Goldberg

Senior Vice President of Software Develpment
*FIX Flyer LLC*
http://www.FIXFlyer.com/ <http://www.fixflyer.com/>

NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED
RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING,
DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS
E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.

-- 
*Notice to Recipient*: https://www.fixflyer.com/disclaimer 
<https://www.fixflyer.com/disclaimer>

Re: Deleted documents and NRT Readers

Posted by Stuart Goldberg <sg...@fixflyer.com>.
Version 4.10.4. Sorry we are woefully behind.

Stuart M Goldberg

Senior Vice President of Software Develpment
*FIX Flyer LLC*
http://www.FIXFlyer.com/ <http://www.fixflyer.com/>

NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED
RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING,
DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS
E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.


On Fri, Jul 20, 2018 at 1:10 PM Michael McCandless <
lucene@mikemccandless.com> wrote:

> Yeah it is surprising that Lucene applied that one delete when you said it
> didn't have to.
>
> Which Lucene version?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jul 19, 2018 at 5:54 PM, Stuart Goldberg <sg...@fixflyer.com>
> wrote:
>
>> Understood. But I would think that in a tiny program where I add one
>> document and then update it, that the load is so small that it for sure
>> would not have applied the delete.
>>
>> Why am I wrong in thinking this?
>>
>>
>> On Thu, Jul 19, 2018, 5:50 PM Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> Passing applyDeletes=false means Lucene does not have to apply all of
>>> its buffered deletes.
>>>
>>> But, it still may have already applied some deletes, so there's no
>>> guarantee that it won't have applied deletes.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Thu, Jul 19, 2018 at 3:23 PM, Stuart Goldberg <sgoldberg@fixflyer.com
>>> > wrote:
>>>
>>>> I used NRT readers all the time. I create then with 'applyDeletes' set
>>>> to
>>>> false for performance reasons and take the javadoc at its word that my
>>>> code
>>>> has to be prepared to deal with deleted documents. I thought I
>>>> understood
>>>> that and I wrote my code to be deleted-document-safe.
>>>>
>>>> But I have recently revisited the issue and tried to understand what
>>>> happens using a little test program. I create a document and add it to
>>>> the
>>>> index. I then create a new document that mirrors the first one but I
>>>> change
>>>> the value of a field. Then I call IndexWriter.updateDocument() which is
>>>> a
>>>> delete and an add.
>>>>
>>>> I then get a NRT reader with applyDeletes set to false and do a
>>>> MatchAllDocsQuery search. I would expect to get 2 documents back: the
>>>> current one and the updated one. But I only get back the updated one.
>>>>
>>>> But I know in real code with 1000's of documents flying into the index
>>>> that
>>>> I have gotten deleted documents returned.
>>>>
>>>> Can someone explain to me why my small test program doesn't get the
>>>> deleted
>>>> documents back?
>>>>
>>>> Stuart M Goldberg
>>>>
>>>> Senior Vice President of Software Develpment
>>>> *FIX Flyer LLC*
>>>> http://www.FIXFlyer.com/ <http://www.fixflyer.com/>
>>>>
>>>> NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED
>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING,
>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO
>>>> THIS
>>>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE
>>>> THIS
>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>
>>>> --
>>>> *Notice to Recipient*: https://www.fixflyer.com/disclaimer
>>>> <https://www.fixflyer.com/disclaimer>
>>>>
>>>
>>>
>> *Notice to Recipient*: https://www.fixflyer.com/disclaimer
>
>
>

-- 
*Notice to Recipient*: https://www.fixflyer.com/disclaimer 
<https://www.fixflyer.com/disclaimer>

Re: Deleted documents and NRT Readers

Posted by Michael McCandless <lu...@mikemccandless.com>.
Yeah it is surprising that Lucene applied that one delete when you said it
didn't have to.

Which Lucene version?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 19, 2018 at 5:54 PM, Stuart Goldberg <sg...@fixflyer.com>
wrote:

> Understood. But I would think that in a tiny program where I add one
> document and then update it, that the load is so small that it for sure
> would not have applied the delete.
>
> Why am I wrong in thinking this?
>
>
> On Thu, Jul 19, 2018, 5:50 PM Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Passing applyDeletes=false means Lucene does not have to apply all of its
>> buffered deletes.
>>
>> But, it still may have already applied some deletes, so there's no
>> guarantee that it won't have applied deletes.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jul 19, 2018 at 3:23 PM, Stuart Goldberg <sg...@fixflyer.com>
>> wrote:
>>
>>> I used NRT readers all the time. I create then with 'applyDeletes' set to
>>> false for performance reasons and take the javadoc at its word that my
>>> code
>>> has to be prepared to deal with deleted documents. I thought I understood
>>> that and I wrote my code to be deleted-document-safe.
>>>
>>> But I have recently revisited the issue and tried to understand what
>>> happens using a little test program. I create a document and add it to
>>> the
>>> index. I then create a new document that mirrors the first one but I
>>> change
>>> the value of a field. Then I call IndexWriter.updateDocument() which is a
>>> delete and an add.
>>>
>>> I then get a NRT reader with applyDeletes set to false and do a
>>> MatchAllDocsQuery search. I would expect to get 2 documents back: the
>>> current one and the updated one. But I only get back the updated one.
>>>
>>> But I know in real code with 1000's of documents flying into the index
>>> that
>>> I have gotten deleted documents returned.
>>>
>>> Can someone explain to me why my small test program doesn't get the
>>> deleted
>>> documents back?
>>>
>>> Stuart M Goldberg
>>>
>>> Senior Vice President of Software Develpment
>>> *FIX Flyer LLC*
>>> http://www.FIXFlyer.com/ <http://www.fixflyer.com/>
>>>
>>> NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED
>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>> WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING,
>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE
>>> THIS
>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>
>>> --
>>> *Notice to Recipient*: https://www.fixflyer.com/disclaimer
>>> <https://www.fixflyer.com/disclaimer>
>>>
>>
>>
> *Notice to Recipient*: https://www.fixflyer.com/disclaimer

Re: Deleted documents and NRT Readers

Posted by Stuart Goldberg <sg...@fixflyer.com>.
Understood. But I would think that in a tiny program where I add one
document and then update it, that the load is so small that it for sure
would not have applied the delete.

Why am I wrong in thinking this?

On Thu, Jul 19, 2018, 5:50 PM Michael McCandless <lu...@mikemccandless.com>
wrote:

> Passing applyDeletes=false means Lucene does not have to apply all of its
> buffered deletes.
>
> But, it still may have already applied some deletes, so there's no
> guarantee that it won't have applied deletes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jul 19, 2018 at 3:23 PM, Stuart Goldberg <sg...@fixflyer.com>
> wrote:
>
>> I used NRT readers all the time. I create then with 'applyDeletes' set to
>> false for performance reasons and take the javadoc at its word that my
>> code
>> has to be prepared to deal with deleted documents. I thought I understood
>> that and I wrote my code to be deleted-document-safe.
>>
>> But I have recently revisited the issue and tried to understand what
>> happens using a little test program. I create a document and add it to the
>> index. I then create a new document that mirrors the first one but I
>> change
>> the value of a field. Then I call IndexWriter.updateDocument() which is a
>> delete and an add.
>>
>> I then get a NRT reader with applyDeletes set to false and do a
>> MatchAllDocsQuery search. I would expect to get 2 documents back: the
>> current one and the updated one. But I only get back the updated one.
>>
>> But I know in real code with 1000's of documents flying into the index
>> that
>> I have gotten deleted documents returned.
>>
>> Can someone explain to me why my small test program doesn't get the
>> deleted
>> documents back?
>>
>> Stuart M Goldberg
>>
>> Senior Vice President of Software Develpment
>> *FIX Flyer LLC*
>> http://www.FIXFlyer.com/ <http://www.fixflyer.com/>
>>
>> NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED
>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>> WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING,
>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS
>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>
>> --
>> *Notice to Recipient*: https://www.fixflyer.com/disclaimer
>> <https://www.fixflyer.com/disclaimer>
>>
>
>

-- 
*Notice to Recipient*: https://www.fixflyer.com/disclaimer 
<https://www.fixflyer.com/disclaimer>

Re: Deleted documents and NRT Readers

Posted by Michael McCandless <lu...@mikemccandless.com>.
Passing applyDeletes=false means Lucene does not have to apply all of its
buffered deletes.

But, it still may have already applied some deletes, so there's no
guarantee that it won't have applied deletes.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 19, 2018 at 3:23 PM, Stuart Goldberg <sg...@fixflyer.com>
wrote:

> I used NRT readers all the time. I create then with 'applyDeletes' set to
> false for performance reasons and take the javadoc at its word that my code
> has to be prepared to deal with deleted documents. I thought I understood
> that and I wrote my code to be deleted-document-safe.
>
> But I have recently revisited the issue and tried to understand what
> happens using a little test program. I create a document and add it to the
> index. I then create a new document that mirrors the first one but I change
> the value of a field. Then I call IndexWriter.updateDocument() which is a
> delete and an add.
>
> I then get a NRT reader with applyDeletes set to false and do a
> MatchAllDocsQuery search. I would expect to get 2 documents back: the
> current one and the updated one. But I only get back the updated one.
>
> But I know in real code with 1000's of documents flying into the index that
> I have gotten deleted documents returned.
>
> Can someone explain to me why my small test program doesn't get the deleted
> documents back?
>
> Stuart M Goldberg
>
> Senior Vice President of Software Develpment
> *FIX Flyer LLC*
> http://www.FIXFlyer.com/ <http://www.fixflyer.com/>
>
> NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED
> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
> WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING,
> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS
> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>
> --
> *Notice to Recipient*: https://www.fixflyer.com/disclaimer
> <https://www.fixflyer.com/disclaimer>
>