You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2009/09/26 01:56:16 UTC
[jira] Updated: (LUCENE-1526) Tombstone deletions in IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated LUCENE-1526:
-------------------------------------
Attachment: LUCENE-1526.patch
This moves us on our way more efficient memory usage in heavy
near realtime search apps, because we don't have to reallocate
an entire byte[] equals to the maxDoc of the segment(s) that
have new deletes.
* A 2 dimensional byte array is used where each actual array is
1024 in length.
* A refs boolean array keeps track of the which arrays need to
be copied when a bit is set (copy on ref).
* The code can be benchmarked against the existing BV
implementation for any possible speed slowdown due to the extra
array lookup (probably minimal to nothing)
* Need to implement the dgaps encoding
* Code isn't committable
* Still need to run all tests, however TestMultiBitVector passes
> Tombstone deletions in IndexReader
> ----------------------------------
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.4
> Reporter: Jason Rutherglen
> Priority: Minor
> Attachments: LUCENE-1526.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader.
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs.
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org