You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Smith (JIRA)" <ji...@apache.org> on 2009/06/19 22:18:07 UTC

[jira] Created: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Add deleteAllDocuments() method to IndexWriter
----------------------------------------------

                 Key: LUCENE-1705
                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
             Project: Lucene - Java
          Issue Type: Wish
          Components: Index
    Affects Versions: 2.4
            Reporter: Tim Smith


Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter

This method should have the same performance and characteristics as:
* currentWriter.close()
* currentWriter = new IndexWriter(..., create=true,...)

This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.

IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)

I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now

deleteAllDocuments() method should:
* abort any background merges (they are pointless once a deleteAll has been received)
* write new segments file referencing no segments

This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith reopened LUCENE-1705:
-------------------------------


Looks like i found an issue with this

The deleteAll() method isn't resetting the nextDocID on the DocumentsWriter (or some similar behaviour)

so, the following state will result in an error:
* deleteAll()
* updateDocument("5", doc)
* commit()

this results in a delete for doc "5" getting buffered, but with a very high "maxDocId"
at the same time, doc is added, however, the following will then occur on commit:
* flush segments to disk
* doc "5" is now in a segment on disk
* run deletes
* doc "5" is now blacklisted from segment 

Will work on fixing this and post a new patch (along with updated test case)

(was worried i was missing an edge case)

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1705:
------------------------------------------

    Assignee: Michael McCandless

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>         Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722021#action_12722021 ] 

Tim Smith commented on LUCENE-1705:
-----------------------------------

My use case is like so:

* IndexReader opened against old index (the last committed index)
* user requests "clearing the index" (for whatever reason, up to them to decide) 
** this will close the current IndexWriter and reopen with create=true (ideally this would just call writer.deleteAll())
** IndexWriter.init() writes out new segments.gen file referencing no segments with create=true (old segments are not yet deleted as IndexReader is still open for them)
** NOTE: new IndexReader not yet opened because they haven't said "commit" yet
** (previous commit will still be the "live" index seen by searches - obviously this means disk is not reclaimed yet)
* user may then feed new documents, or not, (up to them)
* user will then eventually say commit
* new IndexReader will be opened, which should only contain documents added since the "deleteAll()" operation

Ideally, i want to perform this delete all operation very efficiently without needing to reopen an IndexWriter 
The rollback() call would possibly slightly optimize what i am currently doing, but even better still would be a highly efficient deleteAll() method (as all the semantics of this operation would be handled nicely (without jumping through hoops to clear the index))

Creating a new IndexWriter in the same directory with create=true works like a champ, (existing IndexReaders still see the previous commit)
just want do avoid the IndexWriter.close() call



> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1705:
---------------------------------------

    Attachment: LUCENE-1705.patch

Patch looks good Tim!

I added another test case (for NRT reader), removed the un-needed notifyAll & assert, and tweaked javadocs.  I think it's ready to go in... I'll wait a day or two.

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722627#action_12722627 ] 

Tim Smith commented on LUCENE-1705:
-----------------------------------

bq. Do you wanna cons up a patch? It should be a refactoring of what's now done inside IndexWriter's init method...

Had a feeling this was coming.  

I'll give it a shot, (unless someone else volunteers)




> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722172#action_12722172 ] 

Michael McCandless commented on LUCENE-1705:
--------------------------------------------

This should be simple to implement, by taking the logic that's now in IndexWriter's init method (under the {{if (create) {}}) and pulling out into a separate method.

>From IndexWriter's standpoint, this is just another commit, on equal footing with all prior commits even though it up and deleted all segments.  So eg a deletion policy could choose to keep those past commits around, IndexReaders can open those past commits, etc.

One good functional gain by offering this method is it could be done within a single IndexWriter transaction.  Ie you could deleteAll, make further changes, etc., but then change your mind and rollback the entire transaction.  Whereas, requiring if you close & re-open the writer with create=true, you've necessarily committed the change to the index.

So I don't think we should implicitly do a commit() from within deleteAll(); I think the user should call commit() on their own.

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722006#action_12722006 ] 

Shai Erera commented on LUCENE-1705:
------------------------------------

My search app has such a scenario, and currently we just delete all the documents given a certain criteria (something similar to the above MatchAllDocsQuery. But I actually think that's the wrong approach. If you want to delete all the documents from the index, you'd better create a new one. The main reason is that if your index has, say, 10M documents, a deleteAll() will keep those 10M in the index, and when you'll re-index, the index size will be doubled. Worth still, the deleted documents may belong to segments which will not be merged/optimized right away (depends on your mergeFactor setting), and therefore will stick around for a long time (until you call optimize() or expungeDeletes()).

But, creating a new IndexWriter right away, while overriding the current one is not so smart, because your users will be left w/ no search results until the index has accumulated enough documents. Therefore I think the solution for such an approach should be:
# Call writer.rollback() - abort all current operations, cancel everything until the last commit.
# Create a new IndexWriter in a new directory and re-index everything.
# In the meantime, all your search operations go against the current index, which you know is not going to change until the other one is re-built, and therefore you can also optimize things, by opening an IndexReader and stop any accounting your code may do - just leave it open.
# When re-indexing has complete, sync all your code and:
#* Define your workDir to be the new index dir. That way new searches can begin right away on the index index)
#* Safely delete the old index dir (probably need to do something here to ensure no readers are open against this dir etc.).

That's a high-level description and I realize it may have some holes here and there, but you get the point.

If we were to create a deleteAll() method, I'd expect it to work that way. I.e., the solution you proposed above (write a new segments file referencing no segments) would prevent all searches until something new is actually re-indexed right?

I have to admit though, that I don't have an idea yet on how it can be done inside Lucene, such that new readers will see the old segments, while when I finish re-indexing and call commit, the previous segments will just be deleted.

A wild shot (and then I'll go to sleep on it) - how about if you re-index everything, not committing during that time at all. Readers that are open against the current directory will see all the documents, EXCEPT the new ones you're adding (same for new readers that you may open). When you're done re-indexing, you'll call a commitNewOnly, which will create an empty segments file and then call commit. That way, assuming you're using KeepOnlyLastCommitDeletionPolicy, after the existing readers will close, any new reader that will be opened will see the new segments only, and the next time you commit, the old segments will be deleted.

That will move the deleteAll() method to the application side, since it knows when it can safely delete all the current segments. If you don't have such a requirement (keeping an index for searches until re-indexing is complete), then I think you can safely close() the index and re-create it?

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722634#action_12722634 ] 

Shai Erera commented on LUCENE-1705:
------------------------------------

bq. I'll give it a shot, (unless someone else volunteers)

I don't mind to have a deep dive here, if you're willing to wait a couple days.

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722655#action_12722655 ] 

Tim Smith commented on LUCENE-1705:
-----------------------------------

Looking closer, rollbackTransaction() isn't the right thing to use here (but it looks pretty close to the right thing)

will work up a patch adapting rollbackTransaction() to a deleteAll() method

might also need to borrow some from rollbackInternal() (not sure if i see rollbackTransaction rolling back any documents buffered in memory)

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722649#action_12722649 ] 

Tim Smith commented on LUCENE-1705:
-----------------------------------

It seems like this method could be very simple (obviously i'm not an expert on IndexWriter code (yet))

seems like this is pretty close:
{code}
public synchronized void deleteAll() throws IOException {
  rollbackTransaction(); // might need to unroll this and not do everything here
  segmentInfos.clear();
  checkpoint(); // should we do this here? or just let the inevitable commit do this?
}
{code}


Alternatively, i suppose we could just copy rollbackTransaction to a deleteAll() method with the one change of not restoring the rollbackSegmentInfos (just leave it cleared)

This may result in doing less work (and be more self contained







> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1705.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9

Thanks Tim!

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722583#action_12722583 ] 

Tim Smith commented on LUCENE-1705:
-----------------------------------

I agree, it would be best if the deleteAll() could be rolled back and was just like any other add/update/delete operation on the index (provided it doesn't cause performance of the operation to degrade to a delete by query with a \*:\*)

That then greatly improves the current way that deleteAll() must be done (either sending a \*:\* to query delete method, or closing IndexWriter and reopening)



> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith updated LUCENE-1705:
------------------------------

    Attachment: DeleteAllFlushDocCountFix.patch

Here's a patch that fixes the deleteAll() + updateDocument() issue

just needed to set the FlushDocCount to 0 after aborting the outstanding documents

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: DeleteAllFlushDocCountFix.patch, IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722612#action_12722612 ] 

Michael McCandless commented on LUCENE-1705:
--------------------------------------------

bq. provided it doesn't cause performance of the operation to degrade to a delete by query with a *:*

The delete actually would not be with a *:* query; instead, we simple remove all referenced segments from the segments_N file.  So, it would be exceptionally fast.

Do you wanna cons up a patch?  It should be a refactoring of what's now done inside IndexWriter's init method...

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith updated LUCENE-1705:
------------------------------

    Attachment: TestIndexWriterDelete.patch

Here's a patch to TestIndexWriterDelete that shows the problem

after the deleteAll(), a document is added and a document is updated
the added document gets indexed, the updated document does not

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch, TestIndexWriterDelete.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1705.
----------------------------------------

    Resolution: Fixed

Thanks Tim!  I just committed this.

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: DeleteAllFlushDocCountFix.patch, IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith updated LUCENE-1705:
------------------------------

    Attachment: IndexWriterDeleteAll.patch

Here's a patch adding a deleteAll() method (based on internalRollback())

Patch includes tests.

deleteAll() method can be rolled back (if IndexWriter supports rollback)

This seems to work quite nicely

please comment if there are any edge cases not being handled.

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>         Attachments: IndexWriterDeleteAll.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Smith updated LUCENE-1705:
------------------------------

    Attachment:     (was: TestIndexWriterDelete.patch)

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: DeleteAllFlushDocCountFix.patch, IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a commit(), as far as index visibility goes (new IndexReader opening would get the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice to finally see this added such that the IndexWriter would not need to be closed to perform the "clear" as this seems to be the general recommendation for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org