You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shay Banon (JIRA)" <ji...@apache.org> on 2009/06/29 09:12:47 UTC

[jira] Created: (LUCENE-1721) IndexWriter to allow deletion by doc ids

IndexWriter to allow deletion by doc ids
----------------------------------------

                 Key: LUCENE-1721
                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Shay Banon


It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.

More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728109#action_12728109 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

Looks like its actually a SegmentReader

old index:   org.apache.lucene.index.SegmentReader@f38cf0
new index: org.apache.lucene.index.SegmentReader@7bfc04

i guess that in 2.4, if you have an optimized index, and you call reopen(), you will always get a new SegmentReader instance (so the == check isn't really valid)
i guess this is a non-issue in 2.9 (opened index is never a SegmentReader, even in the optimized case)

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727620#action_12727620 ] 

Yonik Seeley commented on LUCENE-1721:
--------------------------------------

bq. obviously, this is rather impractical as this would require opening a SegmentReader for each and every segment for each delete given the current implementation of handling deletes (causing this special delete to be extremely slow)

Right - seems better to be handled at the application level.  Solr used to do it's own delete handling before switching to using Lucene's delete-by-term - something like that would give you more control.

bq. If the index only receives deletes that do not match the index, the application can be aware of this and prevent performing a commit (thereby removing the need for application specific cache loading/analytics/etc performed after a "noop" commit).

Sounds like you could perhaps use reopen() or the NRT (near realtime) patches - get a new reader, see what segments changed (if any).

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727623#action_12727623 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

bq. Sounds like you could perhaps use reopen() or the NRT (near realtime) patches - get a new reader, see what segments changed (if any).

Yeah, i'm aware of all these (and will take advantage of them once 2.9 is out), but some custom caches may not work on a per segment basis (may be much slower when evaluated this way, or may not be able to be constructed in the first place without full index as a context)

was just throwing out that use case, as it is one i have run into and is one closely related to wanting to delete documents by internal id.

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727614#action_12727614 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

One thing that would be nice to see is a boolean return value from "deleteByTerm" that would indicate if any documents will actually be deleted (or even more, the count of docs that would be deleted)

This would allow an application to perform more optimal commits/optimizes. If the index only receives deletes that do not match the index, the application can be aware of this and prevent performing a commit (thereby removing the need for application specific cache loading/analytics/etc performed after a "noop" commit).  This is extremely useful in the event where TONS of deletes will be received, but very few updates with frequent commit intervals

obviously, this is rather impractical as this would require opening a SegmentReader for each and every segment for each delete given the current implementation of handling deletes (causing this special delete to be extremely slow)

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727646#action_12727646 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

Absolutely nothing would have to have actually changed in order to reuse an old IndexReader.
Is there a simple way to identify if this is true? Or at least to identity if the commit/optimize resulted in absolutely no change? reliably?

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728098#action_12728098 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

reopen() actually doesn't do the trick

the following situation will always result in reopen() returning a different IndexReader instance (at least in 2.4):
* commit()
* open IndexReader
* delete(docid not in index)
* commit() - creates new segments_N file (even though no files actually changed)
* reopen() IndexReader - returns new instance, because of new segments_N file

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727639#action_12727639 ] 

Jason Rutherglen commented on LUCENE-1721:
------------------------------------------

I'm still of the somewhat naive opinion delete by doc id is
possible simply using IR.deleteDocument on a IW.getReader. The
reason is we can maintain a doc id mapping/genealogy as segments
are merged. We'd simply keep a chain of child readers in a
reader, a delete would cascade through it.

I'm curious as to what other use cases there are for this, maybe
editing field caches? 

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725140#action_12725140 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

how about a delete method on the IndexWriter that takes a segment number and a document id

it would also be required to add methods to the IndexReader to get the segment number and local document id for a docid, but this should then work just fine

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728104#action_12728104 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

I'm working in lucene 2.4, so i don't have access to the "child" IndexReaders
however i'm 99.99999% positive this is the case (or some similar event) (especially as my index is optimized at this point)

my next though was to inspect the IndexCommit for each and validate that they contain the same exact files (with same exact timestamps)
this is just more work (need to save of the IndexCommit at open time to make sure the timestamps will be captured at open time, then compare that with the new one (and there's no convenience method on IndexCommit for this type of equality check)


> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728141#action_12728141 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

IndexReader.reopen() solves my use case for wanting this functionality

I suppose if anyone else has a use case for this functionality, they should add it

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728113#action_12728113 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

Actually, it looks like i am seeing 2 separate segments at reopen() time (_3, then _4), however at this time my index should have been optimized
will track this down (probably just a problem with my test)

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727683#action_12727683 ] 

Yonik Seeley commented on LUCENE-1721:
--------------------------------------

bq. Absolutely nothing would have to have actually changed in order to reuse an old IndexReader.

Right - see the javadoc for reopen() then... it currently says that if nothing has changed, you get the same IndexReader instance back.


> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727641#action_12727641 ] 

Yonik Seeley commented on LUCENE-1721:
--------------------------------------

bq. but some custom caches may not work on a per segment basis

Just because you get a new IndexReader doesn't mean you have to use it, if not enough (or nothing) changed.

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728120#action_12728120 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

found my problem, i wasn't doing a commit() after the optimize() call, so an index opened after optimize() was not the optimized index
reopen() works like a champ now
the javadoc for optimize() isn't fully clear on these semantics

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727695#action_12727695 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

That looks like its pretty close, and is definitely better than assuming the index changed, but still not "exactly" what would be desired (at least by me).
looks IndexReader.reopen() will give a new IndexReader instance even if there are no "actual" changes to the index (deletes/new segments) in the event that some background merges were performed. This can result in a new IndexReader being opened (via reopen()) even if no content changed. (while this will be a more optimal search index, it will invalidate caches that will then need to be reloaded) 

a return status from commit would be more desirable
such a status could list the number new documents, deleted documents, and new segments/merged segments, and so on
That way, an application could choose what criteria would facilitate opening a new index
(this is going even more off topic though)


> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725141#action_12725141 ] 

Tim Smith commented on LUCENE-1721:
-----------------------------------

i suppose even that approach would cause problems if segments merge between getting the segment number/local doc pair and actuallly asking for the delete

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725109#action_12725109 ] 

Michael McCandless commented on LUCENE-1721:
--------------------------------------------

This is a frequently requested feature, and I agree it'd be useful, but the problem is docID is in general not usable in the context of a writer since docIDs shift when segments that have deletions are committed.

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728100#action_12728100 ] 

Yonik Seeley commented on LUCENE-1721:
--------------------------------------

This is presumably a MultiReader that is returned... can you check to see if it contains the same segment reader instances (it should)? 

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727610#action_12727610 ] 

Michael McCandless commented on LUCENE-1721:
--------------------------------------------

Right, a merge can commit at any time.

Since a writer can change docIDs at any time, it's never safe for something outside to hold onto docIDs; this is what's good about delete by Term and by Query: they are "late binding", ie, only resolve to docID at the exact moment deletion is done.

> IndexWriter to allow deletion by doc ids
> ----------------------------------------
>
>                 Key: LUCENE-1721
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1721
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. It makes sense for cases where a "query" has been executed beforehand, and later, that query needs to be applied in order to delete the matched documents.
> More information here: http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org