You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (JIRA)" <ji...@apache.org> on 2007/02/21 09:15:05 UTC
[jira] Created: (LUCENE-808) bufferDeleteTerm in IndexWriter might
flush prematurely
bufferDeleteTerm in IndexWriter might flush prematurely
-------------------------------------------------------
Key: LUCENE-808
URL: https://issues.apache.org/jira/browse/LUCENE-808
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 2.1
Reporter: Doron Cohen
Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
although all but the first are no op if no docs were added in between. Hence deletes would
be flushed too soon.
It is a minor problem, should be rare, but it seems cleaner to fix this.
Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Created: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely
Posted by Ning Li <ni...@gmail.com>.
The code correctly reflects its designed semantics:
numBufferedDeleteTerms is a simple sum of terms passed to
updateDocument or deleteDocuments.
If the first of two successive calls to the same term should be
considered no op if no docs were added in between, shouldn't the first
also be considered no op if the docs added in between do not contain
the term? Whether a doc contains a term, however, can only be
determined at the time of actual deletion for performance reasons.
Thus I think the original semantics is cleaner.
Ning
On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
>
> It is a minor problem, should be rare, but it seems cleaner to fix this.
>
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely
Posted by Ning Li <ni...@gmail.com>.
On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> Imagine the application and Lucene could talk, with the current
> implementation we could hear something like this: ...
However, there could be multiple threads updating the same index. For
example, thread 1 deletes the term "id:5" twice, thread 2 inserts a
document with "id:10". The following two are among the possible
execution sequences:
Sequence 1:
thread 1 deletes "id:5"
thread 1 deletes "id:5"
thread 2 inserts document "id:10"
Sequence 2:
thread 1 deletes "id:5"
thread 2 inserts document "id:10"
thread 1 deletes "id:5".
They should return the same numBufferedDeleteTerms, not different ones.
Ning
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter
might flush prematurely
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474808 ]
Doron Cohen commented on LUCENE-808:
------------------------------------
Ning Li wrote:
> The code correctly reflects its designed semantics:
> numBufferedDeleteTerms is a simple sum of terms passed to
> updateDocument or deleteDocuments.
>
> If the first of two successive calls to the same term should be
> considered no op if no docs were added in between, shouldn't the first
> also be considered no op if the docs added in between do not contain
> the term? Whether a doc contains a term, however, can only be
> determined at the time of actual deletion for performance reasons.
>
> Thus I think the original semantics is cleaner.
I agree, the code is correct for a 'simple sum' semantics.
Looking at the javadocs for setMaxBufferedDeleteTerms(), it says:
"minimal number of delete terms". To me, this reads like: "minimal
number of (actual) delete terms".
But beyond one definition or another, I guess the question should be
what would application developers expect. For an operation that is
clearly a no-op, wouldn't they expect no side effects?
As an example, if an application calls IndexWriter.flush() twice
in a row, second call is a no-op and would have no side effects.
Similarly, when editing a document or file, clicking "save" will
do nothing in case there are no changes (otherwise users would be
quite surprised).
Imagine the application and Lucene could talk, with the current
implementation we could hear something like this:
[applic] <calling del-by-term>;
[lucene] <increment buf-del-terms-counter>;
[applic] <searching>; "why on earth weren't these docs deleted?"
[applic] <calling del-by-term again for same term>;
[lucene] <incrementing buf-del-terms-counter again; merging>;
[applic] <searching>; "that's better! mmm... I wonder why the
first delete of this term didn't do it... Was there
any difference between these calls?"
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this.
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Assigned: (LUCENE-808) bufferDeleteTerm in IndexWriter might
flush prematurely
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen reassigned LUCENE-808:
----------------------------------
Assignee: Doron Cohen
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this.
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-808) bufferDeleteTerm in IndexWriter might
flush prematurely
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-808:
-------------------------------
Priority: Minor (was: Major)
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Priority: Minor
> Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this.
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-808) bufferDeleteTerm in IndexWriter might
flush prematurely
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-808:
-------------------------------
Attachment: successive_bufferDeleteTerm.patch
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
> Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this.
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-808) bufferDeleteTerm in IndexWriter might
flush prematurely
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen resolved LUCENE-808.
--------------------------------
Resolution: Invalid
Lucene Fields: [Patch Available] (was: [New, Patch Available])
No one else but me consider this behavior a problem, so close it.
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Priority: Minor
> Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this.
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter
might flush prematurely
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474924 ]
Doron Cohen commented on LUCENE-808:
------------------------------------
[ moving discussion back to JIRA ]
Ning Li wrote:
> On 2/21/07, Doron Cohen (JIRA) <ji...@apache.org> wrote:
> > Imagine the application and Lucene could talk, with the current
> > implementation we could hear something like this: ...
>
> However, there could be multiple threads updating the same index. For
> example, thread 1 deletes the term "id:5" twice, thread 2 inserts a
> document with "id:10". The following two are among the possible
> execution sequences:
> Sequence 1:
> thread 1 deletes "id:5"
> thread 1 deletes "id:5"
> thread 2 inserts document "id:10"
> Sequence 2:
> thread 1 deletes "id:5"
> thread 2 inserts document "id:10"
> thread 1 deletes "id:5".
>
> They should return the same numBufferedDeleteTerms, not different ones.
Nice example Ning!
Mmmm... I am still not convinced... :-)
Assume the inserts were with "id:5", then after sequence 1 there
would be a doc with "id:5" in the index, but after sequence 2
there would not be such a doc. NumDocs() would be different
in the two sequences. Why should numBufferedDeleteTerms be the same?
Anyhow, even if we would agree that this is a problem, I think it
is a minor one, and I am ok with deciding to leave things as they
are. Writing this piece from start, you may see internal logic that
I don't see.
Let's give it a few days, perhaps get comments from others,
(perhaps change our mind about it :-) ). If nothing changes
I think I will set "won't fix".
Regards,
Doron
> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>
> Key: LUCENE-808
> URL: https://issues.apache.org/jira/browse/LUCENE-808
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.1
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Attachments: successive_bufferDeleteTerm.patch
>
>
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes would
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this.
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior. All tests pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org