You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (JIRA)" <ji...@apache.org> on 2007/02/21 09:15:05 UTC

[jira] Created: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

bufferDeleteTerm in IndexWriter might flush prematurely
-------------------------------------------------------

                 Key: LUCENE-808
                 URL: https://issues.apache.org/jira/browse/LUCENE-808
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.1
            Reporter: Doron Cohen


Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
although all but the first are no op if no docs were added in between. Hence deletes would
be flushed too soon.

It is a minor problem, should be rare, but it seems cleaner to fix this. 

Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Created: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by Ning Li <ni...@gmail.com>.
The code correctly reflects its designed semantics:
numBufferedDeleteTerms is a simple sum of terms passed to
updateDocument or deleteDocuments.

If the first of two successive calls to the same term should be
considered no op if no docs were added in between, shouldn't the first
also be considered no op if the docs added in between do not contain
the term? Whether a doc contains a term, however, can only be
determined at the time of actual deletion for performance reasons.

Thus I think the original semantics is cleaner.

Ning

On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
>
> It is a minor problem, should be rare, but it seems cleaner to fix this.
>
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by Ning Li <ni...@gmail.com>.
On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> Imagine the application and Lucene could talk, with the current
> implementation we could hear something like this: ...

However, there could be multiple threads updating the same index. For
example, thread 1 deletes the term "id:5" twice, thread 2 inserts a
document with "id:10". The following two are among the possible
execution sequences:
Sequence 1:
  thread 1 deletes "id:5"
  thread 1 deletes "id:5"
  thread 2 inserts document "id:10"
Sequence 2:
  thread 1 deletes "id:5"
  thread 2 inserts document "id:10"
  thread 1 deletes "id:5".

They should return the same numBufferedDeleteTerms, not different ones.

Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474808 ] 

Doron Cohen commented on LUCENE-808:
------------------------------------

Ning Li wrote:

> The code correctly reflects its designed semantics:
> numBufferedDeleteTerms is a simple sum of terms passed to
> updateDocument or deleteDocuments.
> 
> If the first of two successive calls to the same term should be
> considered no op if no docs were added in between, shouldn't the first
> also be considered no op if the docs added in between do not contain
> the term? Whether a doc contains a term, however, can only be
> determined at the time of actual deletion for performance reasons.
> 
> Thus I think the original semantics is cleaner.

I agree, the code is correct for a 'simple sum' semantics. 

Looking at the javadocs for setMaxBufferedDeleteTerms(), it says: 
"minimal number of delete terms". To me, this reads like: "minimal 
number of (actual) delete terms".

But beyond one definition or another, I guess the question should be
what would application developers expect. For an operation that is 
clearly a no-op, wouldn't they expect no side effects?

As an example, if an application calls IndexWriter.flush() twice 
in a row, second call is a no-op and would have no side effects.

Similarly, when editing a document or file, clicking "save" will 
do nothing in case there are no changes (otherwise users would be
quite surprised).

Imagine the application and Lucene could talk, with the current 
implementation we could hear something like this:

  [applic] <calling del-by-term>;
  [lucene] <increment buf-del-terms-counter>;
  [applic] <searching>; "why on earth weren't these docs deleted?"
  [applic] <calling del-by-term again for same term>;
  [lucene] <incrementing buf-del-terms-counter again; merging>;
  [applic] <searching>; "that's better! mmm... I wonder why the 
           first delete of this term didn't do it... Was there
           any difference between these calls?"



> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-808:
----------------------------------

    Assignee: Doron Cohen

> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-808:
-------------------------------

    Priority: Minor  (was: Major)

> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-808:
-------------------------------

    Attachment: successive_bufferDeleteTerm.patch

> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-808.
--------------------------------

       Resolution: Invalid
    Lucene Fields: [Patch Available]  (was: [New, Patch Available])

No one else but me consider this behavior a problem, so close it.

> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474924 ] 

Doron Cohen commented on LUCENE-808:
------------------------------------

[ moving discussion back to JIRA ]

Ning Li wrote:

> On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> > Imagine the application and Lucene could talk, with the current
> > implementation we could hear something like this: ...
> 
> However, there could be multiple threads updating the same index. For
> example, thread 1 deletes the term "id:5" twice, thread 2 inserts a
> document with "id:10". The following two are among the possible
> execution sequences:
> Sequence 1:
>   thread 1 deletes "id:5"
>   thread 1 deletes "id:5"
>   thread 2 inserts document "id:10"
> Sequence 2:
>   thread 1 deletes "id:5"
>   thread 2 inserts document "id:10"
>   thread 1 deletes "id:5".
> 
> They should return the same numBufferedDeleteTerms, not different ones.

Nice example Ning!

Mmmm... I am still not convinced... :-) 

Assume the inserts were with "id:5", then after sequence 1 there 
would be a doc with "id:5" in the index, but after sequence 2
there would not be such a doc. NumDocs() would be different
in the two sequences. Why should numBufferedDeleteTerms be the same?

Anyhow, even if we would agree that this is a problem, I think it 
is a minor one, and I am ok with deciding to leave things as they 
are. Writing this piece from start, you may see internal logic that 
I don't see.

Let's give it a few days, perhaps get comments from others, 
(perhaps change our mind about it :-) ).  If nothing changes 
I think I will set "won't fix".

Regards,
Doron


> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org