You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/11 17:47:03 UTC

[Lucene-java Wiki] Update of "LuceneFAQ" by MikeMcCandless

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/LuceneFAQ

The comment on the change is:
Modernize "How do I delete documents from the index?"

------------------------------------------------------------------------------
  
  ==== How do I delete documents from the index? ====
  
- If you know the document number of a document (e.g. when iterating over Hits) that you want to delete you may use:
+ `IndexWriter` allows you to delete by `Term` or by `Query`.  The
+ deletes are buffered and then periodically flushed to the index, and
+ made visible once `commit()` or `close()` is called.
  
- `IndexReader.deleteDocument(docNum)`
+ `IndexReader` can also delete documents, by `Term` or document number,
+ but you must close any open `IndexWriter` before using `IndexReader`
+ to make changes (and, vice/versa).  `IndexReader` also buffers the
+ deletions and does not write changes to the index until `close()` is
+ called, but if you use that same `IndexReader` for searching, the
+ buffered deletions will immediately take effect.  Unlike
+ `IndexWriter`'s delete methods, `IndexReader`'s methods return the
+ number of documents that were deleted.
  
- That will delete the document numbered `docNum` from the index.  Once a document is deleted it will not appear in `TermDocs` nor `TermPositions` enumerations.
+ Generally it's best to use `IndexWriter` for deletions, unless 1) you
+ must delete by document number, 2) you need your searches to
+ immediately reflect the deletions or 3) you must know how many
+ documents were deleted for a given deleteDocuments invocation.
  
- Attempts to read its field with the `document` method will result in an exception.  The presence of this document may still be reflected in the `docFreq` statistic, though this will be corrected eventually as the index is further modified.
+ If you must delete by document number but would otherwise like to use
+ `IndexWriter`, one common approach is to make a primary key field,
+ that holds a unique ID string for each document.  Then you can delete
+ a single document by creating the `Term` containing the ID, and
+ passing that to `IndexWriter`'s `deleteDocuments(Term)` method.
  
+ Once a document is deleted it will not appear in `TermDocs` nor
+ `TermPositions` enumerations, nor any search results.  Attempts to
+ load the document will result in an exception.  The presence of this
+ document may still be reflected in the `docFreq` statistics, and thus
+ alter search scores, though this will be corrected eventually as
+ segments containing deletions are merged.
- If you want to delete all (one or more) documents that contain a specific term you may use:
- 
- `IndexReader.deleteDocuments(term)`
- 
- This is useful if one uses a document field to hold a unique ID string for
- the document.  Then to delete such a document, one merely constructs a
- term with the appropriate field and the unique ID string as its text and
- passes it to this method. Because a variable number of documents can be affected by this method call this method returns the number of documents deleted.
- 
- Starting with Lucene 1.9, the new class `IndexModifier` also allows deleting documents.
  
  
  ==== Is there a way to limit the size of an index? ====