You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (JIRA)" <ji...@apache.org> on 2007/02/21 20:56:06 UTC

[jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

    [ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474808 ] 

Doron Cohen commented on LUCENE-808:
------------------------------------

Ning Li wrote:

> The code correctly reflects its designed semantics:
> numBufferedDeleteTerms is a simple sum of terms passed to
> updateDocument or deleteDocuments.
> 
> If the first of two successive calls to the same term should be
> considered no op if no docs were added in between, shouldn't the first
> also be considered no op if the docs added in between do not contain
> the term? Whether a doc contains a term, however, can only be
> determined at the time of actual deletion for performance reasons.
> 
> Thus I think the original semantics is cleaner.

I agree, the code is correct for a 'simple sum' semantics. 

Looking at the javadocs for setMaxBufferedDeleteTerms(), it says: 
"minimal number of delete terms". To me, this reads like: "minimal 
number of (actual) delete terms".

But beyond one definition or another, I guess the question should be
what would application developers expect. For an operation that is 
clearly a no-op, wouldn't they expect no side effects?

As an example, if an application calls IndexWriter.flush() twice 
in a row, second call is a no-op and would have no side effects.

Similarly, when editing a document or file, clicking "save" will 
do nothing in case there are no changes (otherwise users would be
quite surprised).

Imagine the application and Lucene could talk, with the current 
implementation we could hear something like this:

  [applic] <calling del-by-term>;
  [lucene] <increment buf-del-terms-counter>;
  [applic] <searching>; "why on earth weren't these docs deleted?"
  [applic] <calling del-by-term again for same term>;
  [lucene] <incrementing buf-del-terms-counter again; merging>;
  [applic] <searching>; "that's better! mmm... I wonder why the 
           first delete of this term didn't do it... Was there
           any difference between these calls?"



> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
>                 Key: LUCENE-808
>                 URL: https://issues.apache.org/jira/browse/LUCENE-808
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>         Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

Posted by Ning Li <ni...@gmail.com>.
On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> Imagine the application and Lucene could talk, with the current
> implementation we could hear something like this: ...

However, there could be multiple threads updating the same index. For
example, thread 1 deletes the term "id:5" twice, thread 2 inserts a
document with "id:10". The following two are among the possible
execution sequences:
Sequence 1:
  thread 1 deletes "id:5"
  thread 1 deletes "id:5"
  thread 2 inserts document "id:10"
Sequence 2:
  thread 1 deletes "id:5"
  thread 2 inserts document "id:10"
  thread 1 deletes "id:5".

They should return the same numBufferedDeleteTerms, not different ones.

Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org